@c ***********************************************************************
@node GNUnet Developer Handbook
@chapter GNUnet Developer Handbook

This book is intended to be an introduction for programmers that want to
extend the GNUnet framework. GNUnet is more than a simple peer-to-peer
application.

For developers, GNUnet is:

@itemize @bullet
@item developed by a community that believes in the GNU philosophy
@item Free Software (Free as in Freedom), licensed under the
GNU General Public License
@item A set of standards, including coding conventions and
architectural rules
@item A set of layered protocols, both specifying the communication
between peers as well as the communication between components
of a single peer
@item A set of libraries with well-defined APIs suitable for
writing extensions
@end itemize

In particular, the architecture specifies that a peer consists of many
processes communicating via protocols. Processes can be written in almost
any language.
C and Java @footnote{As well as Guile} APIs exist for accessing existing
services and for writing extensions.
It is possible to write extensions in other languages by
implementing the necessary IPC protocols.

GNUnet can be extended and improved along many possible dimensions, and
anyone interested in Free Software and Freedom-enhancing Networking is
welcome to join the effort. This Developer Handbook attempts to provide
an initial introduction to some of the key design choices and central
components of the system.
This part of the GNUNet documentation is far from complete,
and we welcome informed contributions, be it in the form of
new chapters, sections or insightful comments.

@menu
* Developer Introduction::
* Code overview::
* System Architecture::
* Subsystem stability::
* Naming conventions and coding style guide::
* Build-system::
* Developing extensions for GNUnet using the gnunet-ext template::
* Writing testcases::
* TESTING library::
* Performance regression analysis with Gauger::
* TESTBED Subsystem::
* libgnunetutil::
* Automatic Restart Manager (ARM)::
* TRANSPORT Subsystem::
* NAT library::
* Distance-Vector plugin::
* SMTP plugin::
* Bluetooth plugin::
* WLAN plugin::
* ATS Subsystem::
* CORE Subsystem::
* CADET Subsystem::
* NSE Subsystem::
* HOSTLIST Subsystem::
* IDENTITY Subsystem::
* NAMESTORE Subsystem::
* PEERINFO Subsystem::
* PEERSTORE Subsystem::
* SET Subsystem::
* STATISTICS Subsystem::
* Distributed Hash Table (DHT)::
* GNU Name System (GNS)::
* GNS Namecache::
* REVOCATION Subsystem::
* File-sharing (FS) Subsystem::
* REGEX Subsystem::
@end menu

@node Developer Introduction
@section Developer Introduction

This Developer Handbook is intended as first introduction to GNUnet for
new developers that want to extend the GNUnet framework. After the
introduction, each of the GNUnet subsystems (directories in the
@file{src/} tree) is (supposed to be) covered in its own chapter. In
addition to this documentation, GNUnet developers should be aware of the
services available on the GNUnet server to them.

New developers can have a look a the GNUnet tutorials for C and java
available in the @file{src/} directory of the repository or under the
following links:

@c ** FIXME: Link to files in source, not online.
@c ** FIXME: Where is the Java tutorial?
@itemize @bullet
@item @uref{https://gnunet.org/git/gnunet.git/plain/doc/gnunet-c-tutorial.pdf, GNUnet C tutorial}
@item GNUnet Java tutorial
@end itemize

In addition to the GNUnet Reference Documentation you are reading,
the GNUnet server at @uref{https://gnunet.org} contains
various resources for GNUnet developers and those
who aspire to become regular contributors.
They are all conveniently reachable via the "Developer"
entry in the navigation menu. Some additional tools (such as static
analysis reports) require a special developer access to perform certain
operations. If you want (or require) access, you should contact
@uref{http://grothoff.org/christian/, Christian Grothoff},
GNUnet's maintainer.

The public subsystems on the GNUnet server that help developers are:

@itemize @bullet

@item The version control system (git) keeps our code and enables
distributed development.
It is pubclicly accessible at @uref{https://gnunet.org/git/}.
Only developers with write access can commit code, everyone else is
encouraged to submit patches to the
@uref{https://lists.gnu.org/mailman/listinfo/gnunet-developers, GNUnet-developers mailinglist}.

@item The bugtracking system (Mantis).
We use it to track feature requests, open bug reports and their
resolutions.
It can be accessed at @uref{https://gnunet.org/bugs/}.
Anyone can report bugs, but only developers can claim to have fixed them.

@item Our site installation of the
CI@footnote{Continuous Integration} system @code{Buildbot} is used
to check GNUnet builds automatically on a range of platforms.
The web interface of this CI is exposed at
@uref{https://gnunet.org/buildbot/}.
Builds are triggered automatically 30 minutes after the last commit to
our repository was made.

@item The current quality of our automated test suite is assessed using
Code coverage analysis. This analysis is run daily; however the webpage
is only updated if all automated tests pass at that time. Testcases that
improve our code coverage are always welcome.

@item We try to automatically find bugs using a static analysis scan.
This scan is run daily; however the webpage is only updated if all
automated tests pass at the time. Note that not everything that is
flagged by the analysis is a bug, sometimes even good code can be marked
as possibly problematic. Nevertheless, developers are encouraged to at
least be aware of all issues in their code that are listed.

@item We use Gauger for automatic performance regression visualization.
Details on how to use Gauger are here.

@item We use @uref{http://junit.org/, junit} to automatically test
@command{gnunet-java}.
Automatically generated, current reports on the test suite are here.

@item We use Cobertura to generate test coverage reports for gnunet-java.
Current reports on test coverage are here.

@end itemize



@c ***********************************************************************
@menu
* Project overview::
@end menu

@node Project overview
@subsection Project overview

The GNUnet project consists at this point of several sub-projects. This
section is supposed to give an initial overview about the various
sub-projects. Note that this description also lists projects that are far
from complete, including even those that have literally not a single line
of code in them yet.

GNUnet sub-projects in order of likely relevance are currently:

@table @asis

@item @command{gnunet}
Core of the P2P framework, including file-sharing, VPN and
chat applications; this is what the Developer Handbook covers mostly
@item @command{gnunet-gtk}
Gtk+-based user interfaces, including:

@itemize @bullet
@item @command{gnunet-fs-gtk} (file-sharing),
@item @command{gnunet-statistics-gtk} (statistics over time),
@item @command{gnunet-peerinfo-gtk}
(information about current connections and known peers),
@item @command{gnunet-chat-gtk} (chat GUI) and
@item @command{gnunet-setup} (setup tool for "everything")
@end itemize

@item @command{gnunet-fuse}
Mounting directories shared via GNUnet's file-sharing
on GNU/Linux distributions
@item @command{gnunet-update}
Installation and update tool
@item @command{gnunet-ext}
Template for starting 'external' GNUnet projects
@item @command{gnunet-java}
Java APIs for writing GNUnet services and applications
@c ** FIXME: Point to new website repository once we have it:
@c ** @item svn/gnunet-www/ Code and media helping drive the GNUnet
@c website
@item @command{eclectic}
Code to run GNUnet nodes on testbeds for research, development,
testing and evaluation
@c ** FIXME: Solve the status and location of gnunet-qt
@item @command{gnunet-qt}
Qt-based GNUnet GUI (is it depreacated?)
@item @command{gnunet-cocoa}
cocoa-based GNUnet GUI (is it depreacated?)
@item @command{gnunet-guile}

@end table

We are also working on various supporting libraries and tools:
@c ** FIXME: What about gauger, and what about libmwmodem?

@table @asis
@item @command{libextractor}
GNU libextractor (meta data extraction)
@item @command{libmicrohttpd}
GNU libmicrohttpd (embedded HTTP(S) server library)
@item @command{gauger}
Tool for performance regression analysis
@item @command{monkey}
Tool for automated debugging of distributed systems
@item @command{libmwmodem}
Library for accessing satellite connection quality
reports
@item @command{libgnurl}
gnURL (feature-restricted variant of cURL/libcurl)
@end table

Finally, there are various external projects (see links for a list of
those that have a public website) which build on top of the GNUnet
framework.

@c ***********************************************************************
@node Code overview
@section Code overview

This section gives a brief overview of the GNUnet source code.
Specifically, we sketch the function of each of the subdirectories in
the @file{gnunet/src/} directory. The order given is roughly bottom-up
(in terms of the layers of the system).

@table @asis
@item @file{util/} --- libgnunetutil
Library with general utility functions, all
GNUnet binaries link against this library. Anything from memory
allocation and data structures to cryptography and inter-process
communication. The goal is to provide an OS-independent interface and
more 'secure' or convenient implementations of commonly used primitives.
The API is spread over more than a dozen headers, developers should study
those closely to avoid duplicating existing functions.
@pxref{libgnunetutil}.
@item @file{hello/} --- libgnunethello
HELLO messages are used to
describe under which addresses a peer can be reached (for example,
protocol, IP, port). This library manages parsing and generating of HELLO
messages.
@item @file{block/} --- libgnunetblock
The DHT and other components of GNUnet
store information in units called 'blocks'. Each block has a type and the
type defines a particular format and how that binary format is to be
linked to a hash code (the key for the DHT and for databases). The block
library is a wapper around block plugins which provide the necessary
functions for each block type.
@item @file{statistics/} --- statistics service
The statistics service enables associating
values (of type uint64_t) with a componenet name and a string. The main
uses is debugging (counting events), performance tracking and user
entertainment (what did my peer do today?).
@item @file{arm/} --- Automatic Restart Manager (ARM)
The automatic-restart-manager (ARM) service
is the GNUnet master service. Its role is to start gnunet-services, to
re-start them when they crashed and finally to shut down the system when
requested.
@item @file{peerinfo/} --- peerinfo service
The peerinfo service keeps track of which peers are known
to the local peer and also tracks the validated addresses for each peer
(in the form of a HELLO message) for each of those peers. The peer is not
necessarily connected to all peers known to the peerinfo service.
Peerinfo provides persistent storage for peer identities --- peers are
not forgotten just because of a system restart.
@item @file{datacache/} --- libgnunetdatacache
The datacache library provides (temporary) block storage for the DHT.
Existing plugins can store blocks in Sqlite, Postgres or MySQL databases.
All data stored in the cache is lost when the peer is stopped or
restarted (datacache uses temporary tables).
@item @file{datastore/} --- datastore service
The datastore service stores file-sharing blocks in
databases for extended periods of time. In contrast to the datacache, data
is not lost when peers restart. However, quota restrictions may still
cause old, expired or low-priority data to be eventually discarded.
Existing plugins can store blocks in Sqlite, Postgres or MySQL databases.
@item @file{template/} --- service template
Template for writing a new service. Does nothing.
@item @file{ats/} --- Automatic Transport Selection
The automatic transport selection (ATS) service
is responsible for deciding which address (i.e.
which transport plugin) should be used for communication with other peers,
and at what bandwidth.
@item @file{nat/} --- libgnunetnat
Library that provides basic functions for NAT traversal.
The library supports NAT traversal with
manual hole-punching by the user, UPnP and ICMP-based autonomous NAT
traversal. The library also includes an API for testing if the current
configuration works and the @code{gnunet-nat-server} which provides an
external service to test the local configuration.
@item @file{fragmentation/} --- libgnunetfragmentation
Some transports (UDP and WLAN, mostly) have restrictions on the maximum
transfer unit (MTU) for packets. The fragmentation library can be used to
break larger packets into chunks of at most 1k and transmit the resulting
fragments reliabily (with acknowledgement, retransmission, timeouts,
etc.).
@item @file{transport/} --- transport service
The transport service is responsible for managing the
basic P2P communication. It uses plugins to support P2P communication
over TCP, UDP, HTTP, HTTPS and other protocols.The transport service
validates peer addresses, enforces bandwidth restrictions, limits the
total number of connections and enforces connectivity restrictions (i.e.
friends-only).
@item @file{peerinfo-tool/} --- gnunet-peerinfo
This directory contains the gnunet-peerinfo binary which can be used to
inspect the peers and HELLOs known to the peerinfo service.
@item @file{core/}
The core service is responsible for establishing encrypted, authenticated
connections with other peers, encrypting and decrypting messages and
forwarding messages to higher-level services that are interested in them.
@item @file{testing/} --- libgnunettesting
The testing library allows starting (and stopping) peers
for writing testcases.
It also supports automatic generation of configurations for peers
ensuring that the ports and paths are disjoint. libgnunettesting is also
the foundation for the testbed service
@item @file{testbed/} --- testbed service
The testbed service is used for creating small or large scale deployments
of GNUnet peers for evaluation of protocols.
It facilitates peer depolyments on multiple
hosts (for example, in a cluster) and establishing varous network
topologies (both underlay and overlay).
@item @file{nse/} --- Network Size Estimation
The network size estimation (NSE) service
implements a protocol for (securely) estimating the current size of the
P2P network.
@item @file{dht/} --- distributed hash table
The distributed hash table (DHT) service provides a
distributed implementation of a hash table to store blocks under hash
keys in the P2P network.
@item @file{hostlist/} --- hostlist service
The hostlist service allows learning about
other peers in the network by downloading HELLO messages from an HTTP
server, can be configured to run such an HTTP server and also implements
a P2P protocol to advertise and automatically learn about other peers
that offer a public hostlist server.
@item @file{topology/} --- topology service
The topology service is responsible for
maintaining the mesh topology. It tries to maintain connections to friends
(depending on the configuration) and also tries to ensure that the peer
has a decent number of active connections at all times. If necessary, new
connections are added. All peers should run the topology service,
otherwise they may end up not being connected to any other peer (unless
some other service ensures that core establishes the required
connections). The topology service also tells the transport service which
connections are permitted (for friend-to-friend networking)
@item @file{fs/} --- file-sharing
The file-sharing (FS) service implements GNUnet's
file-sharing application. Both anonymous file-sharing (using gap) and
non-anonymous file-sharing (using dht) are supported.
@item @file{cadet/} --- cadet service
The CADET service provides a general-purpose routing abstraction to create
end-to-end encrypted tunnels in mesh networks. We wrote a paper
documenting key aspects of the design.
@item @file{tun/} --- libgnunettun
Library for building IPv4, IPv6 packets and creating
checksums for UDP, TCP and ICMP packets. The header
defines C structs for common Internet packet formats and in particular
structs for interacting with TUN (virtual network) interfaces.
@item @file{mysql/} --- libgnunetmysql
Library for creating and executing prepared MySQL
statements and to manage the connection to the MySQL database.
Essentially a lightweight wrapper for the interaction between GNUnet
components and libmysqlclient.
@item @file{dns/}
Service that allows intercepting and modifying DNS requests of
the local machine. Currently used for IPv4-IPv6 protocol translation
(DNS-ALG) as implemented by "pt/" and for the GNUnet naming system. The
service can also be configured to offer an exit service for DNS traffic.
@item @file{vpn/} --- VPN service
The virtual public network (VPN) service provides a virtual
tunnel interface (VTUN) for IP routing over GNUnet.
Needs some other peers to run an "exit" service to work.
Can be activated using the "gnunet-vpn" tool or integrated with DNS using
the "pt" daemon.
@item @file{exit/}
Daemon to allow traffic from the VPN to exit this
peer to the Internet or to specific IP-based services of the local peer.
Currently, an exit service can only be restricted to IPv4 or IPv6, not to
specific ports and or IP address ranges. If this is not acceptable,
additional firewall rules must be added manually. exit currently only
works for normal UDP, TCP and ICMP traffic; DNS queries need to leave the
system via a DNS service.
@item @file{pt/}
protocol translation daemon. This daemon enables 4-to-6,
6-to-4, 4-over-6 or 6-over-4 transitions for the local system. It
essentially uses "DNS" to intercept DNS replies and then maps results to
those offered by the VPN, which then sends them using mesh to some daemon
offering an appropriate exit service.
@item @file{identity/}
Management of egos (alter egos) of a user; identities are
essentially named ECC private keys and used for zones in the GNU name
system and for namespaces in file-sharing, but might find other uses later
@item @file{revocation/}
Key revocation service, can be used to revoke the
private key of an identity if it has been compromised
@item @file{namecache/}
Cache for resolution results for the GNU name system;
data is encrypted and can be shared among users,
loss of the data should ideally only result in a
performance degradation (persistence not required)
@item @file{namestore/}
Database for the GNU name system with per-user private information,
persistence required
@item @file{gns/}
GNU name system, a GNU approach to DNS and PKI.
@item @file{dv/}
A plugin for distance-vector (DV)-based routing.
DV consists of a service and a transport plugin to provide peers
with the illusion of a direct P2P connection for connections
that use multiple (typically up to 3) hops in the actual underlay network.
@item @file{regex/}
Service for the (distributed) evaluation of regular expressions.
@item @file{scalarproduct/}
The scalar product service offers an API to perform a secure multiparty
computation which calculates a scalar product between two peers
without exposing the private input vectors of the peers to each other.
@item @file{consensus/}
The consensus service will allow a set of peers to agree
on a set of values via a distributed set union computation.
@item @file{rest/}
The rest API allows access to GNUnet services using RESTful interaction.
The services provide plugins that can exposed by the rest server.
@item @file{experimentation/}
The experimentation daemon coordinates distributed
experimentation to evaluate transport and ATS properties.
@end table

@c ***********************************************************************
@node System Architecture
@section System Architecture

GNUnet developers like LEGOs. The blocks are indestructible, can be
stacked together to construct complex buildings and it is generally easy
to swap one block for a different one that has the same shape. GNUnet's
architecture is based on LEGOs:

@c @image{images/service_lego_block,5in,,picture of a LEGO block stack - 3 APIs as connectors upon Network Protocol on top of a Service}

This chapter documents the GNUnet LEGO system, also known as GNUnet's
system architecture.

The most common GNUnet component is a service. Services offer an API (or
several, depending on what you count as "an API") which is implemented as
a library. The library communicates with the main process of the service
using a service-specific network protocol. The main process of the service
typically doesn't fully provide everything that is needed --- it has holes
to be filled by APIs to other services.

A special kind of component in GNUnet are user interfaces and daemons.
Like services, they have holes to be filled by APIs of other services.
Unlike services, daemons do not implement their own network protocol and
they have no API:

The GNUnet system provides a range of services, daemons and user
interfaces, which are then combined into a layered GNUnet instance (also
known as a peer).

Note that while it is generally possible to swap one service for another
compatible service, there is often only one implementation. However,
during development we often have a "new" version of a service in parallel
with an "old" version. While the "new" version is not working, developers
working on other parts of the service can continue their development by
simply using the "old" service. Alternative design ideas can also be
easily investigated by swapping out individual components. This is
typically achieved by simply changing the name of the "BINARY" in the
respective configuration section.

Key properties of GNUnet services are that they must be separate
processes and that they must protect themselves by applying tight error
checking against the network protocol they implement (thereby achieving a
certain degree of robustness).

On the other hand, the APIs are implemented to tolerate failures of the
service, isolating their host process from errors by the service. If the
service process crashes, other services and daemons around it should not
also fail, but instead wait for the service process to be restarted by
ARM.


@c ***********************************************************************
@node Subsystem stability
@section Subsystem stability

This section documents the current stability of the various GNUnet
subsystems. Stability here describes the expected degree of compatibility
with future versions of GNUnet. For each subsystem we distinguish between
compatibility on the P2P network level (communication protocol between
peers), the IPC level (communication between the service and the service
library) and the API level (stability of the API). P2P compatibility is
relevant in terms of which applications are likely going to be able to
communicate with future versions of the network. IPC communication is
relevant for the implementation of language bindings that re-implement the
IPC messages. Finally, API compatibility is relevant to developers that
hope to be able to avoid changes to applications build on top of the APIs
of the framework.

The following table summarizes our current view of the stability of the
respective protocols or APIs:

@multitable @columnfractions .20 .20 .20 .20
@headitem Subsystem @tab P2P @tab IPC @tab C API
@item util @tab n/a @tab n/a @tab stable
@item arm @tab n/a @tab stable @tab stable
@item ats @tab n/a @tab unstable @tab testing
@item block @tab n/a @tab n/a @tab stable
@item cadet @tab testing @tab testing @tab testing
@item consensus @tab experimental @tab experimental @tab experimental
@item core @tab stable @tab stable @tab stable
@item datacache @tab n/a @tab n/a @tab stable
@item datastore @tab n/a @tab stable @tab stable
@item dht @tab stable @tab stable @tab stable
@item dns @tab stable @tab stable @tab stable
@item dv @tab testing @tab testing @tab n/a
@item exit @tab testing @tab n/a @tab n/a
@item fragmentation @tab stable @tab n/a @tab stable
@item fs @tab stable @tab stable @tab stable
@item gns @tab stable @tab stable @tab stable
@item hello @tab n/a @tab n/a @tab testing
@item hostlist @tab stable @tab stable @tab n/a
@item identity @tab stable @tab stable @tab n/a
@item multicast @tab experimental @tab experimental @tab experimental
@item mysql @tab stable @tab n/a @tab stable
@item namestore @tab n/a @tab stable @tab stable
@item nat @tab n/a @tab n/a @tab stable
@item nse @tab stable @tab stable @tab stable
@item peerinfo @tab n/a @tab stable @tab stable
@item psyc @tab experimental @tab experimental @tab experimental
@item pt @tab n/a @tab n/a @tab n/a
@item regex @tab stable @tab stable @tab stable
@item revocation @tab stable @tab stable @tab stable
@item social @tab experimental @tab experimental @tab experimental
@item statistics @tab n/a @tab stable @tab stable
@item testbed @tab n/a @tab testing @tab testing
@item testing @tab n/a @tab n/a @tab testing
@item topology @tab n/a @tab n/a @tab n/a
@item transport @tab stable @tab stable @tab stable
@item tun @tab n/a @tab n/a @tab stable
@item vpn @tab testing @tab n/a @tab n/a
@end multitable

Here is a rough explanation of the values:

@table @samp
@item stable
No incompatible changes are planned at this time; for IPC/APIs, if
there are incompatible changes, they will be minor and might only require
minimal changes to existing code; for P2P, changes will be avoided if at
all possible for the 0.10.x-series

@item testing
No incompatible changes are
planned at this time, but the code is still known to be in flux; so while
we have no concrete plans, our expectation is that there will still be
minor modifications; for P2P, changes will likely be extensions that
should not break existing code

@item unstable
Changes are planned and will happen; however, they
will not be totally radical and the result should still resemble what is
there now; nevertheless, anticipated changes will break protocol/API
compatibility

@item experimental
Changes are planned and the result may look nothing like
what the API/protocol looks like today

@item unknown
Someone should think about where this subsystem headed

@item n/a
This subsystem does not have an API/IPC-protocol/P2P-protocol
@end table

@c ***********************************************************************
@node Naming conventions and coding style guide
@section Naming conventions and coding style guide

Here you can find some rules to help you write code for GNUnet.

@c ***********************************************************************
@menu
* Naming conventions::
* Coding style::
@end menu

@node Naming conventions
@subsection Naming conventions


@c ***********************************************************************
@menu
* include files::
* binaries::
* logging::
* configuration::
* exported symbols::
* private (library-internal) symbols (including structs and macros)::
* testcases::
* performance tests::
* src/ directories::
@end menu

@node include files
@subsubsection include files

@itemize @bullet
@item _lib: library without need for a process
@item _service: library that needs a service process
@item _plugin: plugin definition
@item _protocol: structs used in network protocol
@item exceptions:
@itemize @bullet
@item gnunet_config.h --- generated
@item platform.h --- first included
@item plibc.h --- external library
@item gnunet_common.h --- fundamental routines
@item gnunet_directories.h --- generated
@item gettext.h --- external library
@end itemize
@end itemize

@c ***********************************************************************
@node binaries
@subsubsection binaries

@itemize @bullet
@item gnunet-service-xxx: service process (has listen socket)
@item gnunet-daemon-xxx: daemon process (no listen socket)
@item gnunet-helper-xxx[-yyy]: SUID helper for module xxx
@item gnunet-yyy: command-line tool for end-users
@item libgnunet_plugin_xxx_yyy.so: plugin for API xxx
@item libgnunetxxx.so: library for API xxx
@end itemize

@c ***********************************************************************
@node logging
@subsubsection logging

@itemize @bullet
@item services and daemons use their directory name in
@code{GNUNET_log_setup} (i.e. 'core') and log using
plain 'GNUNET_log'.
@item command-line tools use their full name in
@code{GNUNET_log_setup} (i.e. 'gnunet-publish') and log using
plain 'GNUNET_log'.
@item service access libraries log using
'@code{GNUNET_log_from}' and use '@code{DIRNAME-api}' for the
component (i.e. 'core-api')
@item pure libraries (without associated service) use
'@code{GNUNET_log_from}' with the component set to their
library name (without lib or '@file{.so}'),
which should also be their directory name (i.e. '@file{nat}')
@item plugins should use '@code{GNUNET_log_from}'
with the directory name and the plugin name combined to produce
the component name (i.e. 'transport-tcp').
@item logging should be unified per-file by defining a
@code{LOG} macro with the appropriate arguments,
along these lines:

@example
#define LOG(kind,...)
GNUNET_log_from (kind, "example-api",__VA_ARGS__)
@end example

@end itemize

@c ***********************************************************************
@node configuration
@subsubsection configuration

@itemize @bullet
@item paths (that are substituted in all filenames) are in PATHS
(have as few as possible)
@item all options for a particular module (@file{src/MODULE})
are under @code{[MODULE]}
@item options for a plugin of a module
are under @code{[MODULE-PLUGINNAME]}
@end itemize

@c ***********************************************************************
@node exported symbols
@subsubsection exported symbols

@itemize @bullet
@item must start with @code{GNUNET_modulename_} and be defined in
@file{modulename.c}
@item exceptions: those defined in @file{gnunet_common.h}
@end itemize

@c ***********************************************************************
@node private (library-internal) symbols (including structs and macros)
@subsubsection private (library-internal) symbols (including structs and macros)

@itemize @bullet
@item must NOT start with any prefix
@item must not be exported in a way that linkers could use them or@ other
libraries might see them via headers; they must be either
declared/defined in C source files or in headers that are in the
respective directory under @file{src/modulename/} and NEVER be declared
in @file{src/include/}.
@end itemize

@node testcases
@subsubsection testcases

@itemize @bullet
@item must be called @file{test_module-under-test_case-description.c}
@item "case-description" maybe omitted if there is only one test
@end itemize

@c ***********************************************************************
@node performance tests
@subsubsection performance tests

@itemize @bullet
@item must be called @file{perf_module-under-test_case-description.c}
@item "case-description" maybe omitted if there is only one performance
test
@item Must only be run if @code{HAVE_BENCHMARKS} is satisfied
@end itemize

@c ***********************************************************************
@node src/ directories
@subsubsection src/ directories

@itemize @bullet
@item gnunet-NAME: end-user applications (i.e., gnunet-search, gnunet-arm)
@item gnunet-service-NAME: service processes with accessor library (i.e.,
gnunet-service-arm)
@item libgnunetNAME: accessor library (_service.h-header) or standalone
library (_lib.h-header)
@item gnunet-daemon-NAME: daemon process without accessor library (i.e.,
gnunet-daemon-hostlist) and no GNUnet management port
@item libgnunet_plugin_DIR_NAME: loadable plugins (i.e.,
libgnunet_plugin_transport_tcp)
@end itemize

@cindex Coding style
@node Coding style
@subsection Coding style

@c XXX: Adjust examples to GNU Standards!
@itemize @bullet
@item We follow the GNU Coding Standards (@pxref{Top, The GNU Coding Standards,, standards, The GNU Coding Standards});
@item Indentation is done with spaces, two per level, no tabs;
@item C99 struct initialization is fine;
@item declare only one variable per line, for example:

@noindent
instead of

@example
int i,j;
@end example

@noindent
write:

@example
int i;
int j;
@end example

@c TODO: include actual example from a file in source

@noindent
This helps keep diffs small and forces developers to think precisely about
the type of every variable.
Note that @code{char *} is different from @code{const char*} and
@code{int} is different from @code{unsigned int} or @code{uint32_t}.
Each variable type should be chosen with care.

@item While @code{goto} should generally be avoided, having a
@code{goto} to the end of a function to a block of clean up
statements (free, close, etc.) can be acceptable.

@item Conditions should be written with constants on the left (to avoid
accidental assignment) and with the 'true' target being either the
'error' case or the significantly simpler continuation. For example:

@example
if (0 != stat ("filename," &sbuf)) @{
  error();
 @}
 else @{
   /* handle normal case here */
 @}
@end example

@noindent
instead of

@example
if (stat ("filename," &sbuf) == 0) @{
  /* handle normal case here */
 @} else @{
  error();
 @}
@end example

@noindent
If possible, the error clause should be terminated with a 'return' (or
'goto' to some cleanup routine) and in this case, the 'else' clause
should be omitted:

@example
if (0 != stat ("filename," &sbuf)) @{
  error();
  return;
 @}
/* handle normal case here */
@end example

This serves to avoid deep nesting. The 'constants on the left' rule
applies to all constants (including. @code{GNUNET_SCHEDULER_NO_TASK}),
NULL, and enums). With the two above rules (constants on left, errors in
'true' branch), there is only one way to write most branches correctly.

@item Combined assignments and tests are allowed if they do not hinder
code clarity. For example, one can write:

@example
if (NULL == (value = lookup_function())) @{
  error();
  return;
 @}
@end example

@item Use @code{break} and @code{continue} wherever possible to avoid
deep(er) nesting. Thus, we would write:

@example
next = head;
while (NULL != (pos = next)) @{
  next = pos->next;
  if (! should_free (pos))
    continue;
  GNUNET_CONTAINER_DLL_remove (head, tail, pos);
  GNUNET_free (pos);
 @}
@end example

instead of

@example
next = head; while (NULL != (pos = next)) @{
  next = pos->next;
  if (should_free (pos)) @{
    /* unnecessary nesting! */
    GNUNET_CONTAINER_DLL_remove (head, tail, pos);
    GNUNET_free (pos);
   @}
  @}
@end example

@item We primarily use @code{for} and @code{while} loops.
A @code{while} loop is used if the method for advancing in the loop is
not a straightforward increment operation. In particular, we use:

@example
next = head;
while (NULL != (pos = next))
@{
  next = pos->next;
  if (! should_free (pos))
    continue;
  GNUNET_CONTAINER_DLL_remove (head, tail, pos);
  GNUNET_free (pos);
@}
@end example

to free entries in a list (as the iteration changes the structure of the
list due to the free; the equivalent @code{for} loop does no longer
follow the simple @code{for} paradigm of @code{for(INIT;TEST;INC)}).
However, for loops that do follow the simple @code{for} paradigm we do
use @code{for}, even if it involves linked lists:

@example
/* simple iteration over a linked list */
for (pos = head;
     NULL != pos;
     pos = pos->next)
@{
   use (pos);
@}
@end example


@item The first argument to all higher-order functions in GNUnet must be
declared to be of type @code{void *} and is reserved for a closure. We do
not use inner functions, as trampolines would conflict with setups that
use non-executable stacks.
The first statement in a higher-order function, which unusually should
be part of the variable declarations, should assign the
@code{cls} argument to the precise expected type. For example:

@example
int callback (void *cls, char *args) @{
  struct Foo *foo = cls;
  int other_variables;

   /* rest of function */
@}
@end example


@item It is good practice to write complex @code{if} expressions instead
of using deeply nested @code{if} statements. However, except for addition
and multiplication, all operators should use parens. This is fine:

@example
if ( (1 == foo) || ((0 == bar) && (x != y)) )
  return x;
@end example


However, this is not:

@example
if (1 == foo)
  return x;
if (0 == bar && x != y)
  return x;
@end example

@noindent
Note that splitting the @code{if} statement above is debateable as the
@code{return x} is a very trivial statement. However, once the logic after
the branch becomes more complicated (and is still identical), the "or"
formulation should be used for sure.

@item There should be two empty lines between the end of the function and
the comments describing the following function. There should be a single
empty line after the initial variable declarations of a function. If a
function has no local variables, there should be no initial empty line. If
a long function consists of several complex steps, those steps might be
separated by an empty line (possibly followed by a comment describing the
following step). The code should not contain empty lines in arbitrary
places; if in doubt, it is likely better to NOT have an empty line (this
way, more code will fit on the screen).
@end itemize

@c ***********************************************************************
@node Build-system
@section Build-system

If you have code that is likely not to compile or build rules you might
want to not trigger for most developers, use @code{if HAVE_EXPERIMENTAL}
in your @file{Makefile.am}.
Then it is OK to (temporarily) add non-compiling (or known-to-not-port)
code.

If you want to compile all testcases but NOT run them, run configure with
the @code{--enable-test-suppression} option.

If you want to run all testcases, including those that take a while, run
configure with the @code{--enable-expensive-testcases} option.

If you want to compile and run benchmarks, run configure with the
@code{--enable-benchmarks} option.

If you want to obtain code coverage results, run configure with the
@code{--enable-coverage} option and run the @file{coverage.sh} script in
the @file{contrib/} directory.

@cindex gnunet-ext
@node Developing extensions for GNUnet using the gnunet-ext template
@section Developing extensions for GNUnet using the gnunet-ext template

For developers who want to write extensions for GNUnet we provide the
gnunet-ext template to provide an easy to use skeleton.

gnunet-ext contains the build environment and template files for the
development of GNUnet services, command line tools, APIs and tests.

First of all you have to obtain gnunet-ext from git:

@example
git clone https://gnunet.org/git/gnunet-ext.git
@end example

The next step is to bootstrap and configure it. For configure you have to
provide the path containing GNUnet with
@code{--with-gnunet=/path/to/gnunet} and the prefix where you want the
install the extension using @code{--prefix=/path/to/install}:

@example
./bootstrap
./configure --prefix=/path/to/install --with-gnunet=/path/to/gnunet
@end example

When your GNUnet installation is not included in the default linker search
path, you have to add @code{/path/to/gnunet} to the file
@file{/etc/ld.so.conf} and run @code{ldconfig} or your add it to the
environmental variable @code{LD_LIBRARY_PATH} by using

@example
export LD_LIBRARY_PATH=/path/to/gnunet/lib
@end example

@cindex writing testcases
@node Writing testcases
@section Writing testcases

Ideally, any non-trivial GNUnet code should be covered by automated
testcases. Testcases should reside in the same place as the code that is
being tested. The name of source files implementing tests should begin
with @code{test_} followed by the name of the file that contains
the code that is being tested.

Testcases in GNUnet should be integrated with the autotools build system.
This way, developers and anyone building binary packages will be able to
run all testcases simply by running @code{make check}. The final
testcases shipped with the distribution should output at most some brief
progress information and not display debug messages by default. The
success or failure of a testcase must be indicated by returning zero
(success) or non-zero (failure) from the main method of the testcase.
The integration with the autotools is relatively straightforward and only
requires modifications to the @file{Makefile.am} in the directory
containing the testcase. For a testcase testing the code in @file{foo.c}
the @file{Makefile.am} would contain the following lines:

@example
check_PROGRAMS = test_foo
TESTS = $(check_PROGRAMS)
test_foo_SOURCES = test_foo.c
test_foo_LDADD = $(top_builddir)/src/util/libgnunetutil.la
@end example

Naturally, other libraries used by the testcase may be specified in the
@code{LDADD} directive as necessary.

Often testcases depend on additional input files, such as a configuration
file. These support files have to be listed using the @code{EXTRA_DIST}
directive in order to ensure that they are included in the distribution.

Example:

@example
EXTRA_DIST = test_foo_data.conf
@end example

Executing @code{make check} will run all testcases in the current
directory and all subdirectories. Testcases can be compiled individually
by running @code{make test_foo} and then invoked directly using
@code{./test_foo}. Note that due to the use of plugins in GNUnet, it is
typically necessary to run @code{make install} before running any
testcases. Thus the canonical command @code{make check install} has to be
changed to @code{make install check} for GNUnet.

@cindex TESTING library
@node TESTING library
@section TESTING library

The TESTING library is used for writing testcases which involve starting a
single or multiple peers. While peers can also be started by testcases
using the ARM subsystem, using TESTING library provides an elegant way to
do this. The configurations of the peers are auto-generated from a given
template to have non-conflicting port numbers ensuring that peers'
services do not run into bind errors. This is achieved by testing ports'
availability by binding a listening socket to them before allocating them
to services in the generated configurations.

An another advantage while using TESTING is that it shortens the testcase
startup time as the hostkeys for peers are copied from a pre-computed set
of hostkeys instead of generating them at peer startup which may take a
considerable amount of time when starting multiple peers or on an embedded
processor.

TESTING also allows for certain services to be shared among peers. This
feature is invaluable when testing with multiple peers as it helps to
reduce the number of services run per each peer and hence the total
number of processes run per testcase.

TESTING library only handles creating, starting and stopping peers.
Features useful for testcases such as connecting peers in a topology are
not available in TESTING but are available in the TESTBED subsystem.
Furthermore, TESTING only creates peers on the localhost, however by
using TESTBED testcases can benefit from creating peers across multiple
hosts.

@menu
* API::
* Finer control over peer stop::
* Helper functions::
* Testing with multiple processes::
@end menu

@cindex TESTING API
@node API
@subsection API

TESTING abstracts a group of peers as a TESTING system. All peers in a
system have common hostname and no two services of these peers have a
same port or a UNIX domain socket path.

TESTING system can be created with the function
@code{GNUNET_TESTING_system_create()} which returns a handle to the
system. This function takes a directory path which is used for generating
the configurations of peers, an IP address from which connections to the
peers' services should be allowed, the hostname to be used in peers'
configuration, and an array of shared service specifications of type
@code{struct GNUNET_TESTING_SharedService}.

The shared service specification must specify the name of the service to
share, the configuration pertaining to that shared service and the
maximum number of peers that are allowed to share a single instance of
the shared service.

TESTING system created with @code{GNUNET_TESTING_system_create()} chooses
ports from the default range @code{12000} - @code{56000} while
auto-generating configurations for peers.
This range can be customised with the function
@code{GNUNET_TESTING_system_create_with_portrange()}. This function is
similar to @code{GNUNET_TESTING_system_create()} except that it take 2
additional parameters --- the start and end of the port range to use.

A TESTING system is destroyed with the funciton
@code{GNUNET_TESTING_system_destory()}. This function takes the handle of
the system and a flag to remove the files created in the directory used
to generate configurations.

A peer is created with the function
@code{GNUNET_TESTING_peer_configure()}. This functions takes the system
handle, a configuration template from which the configuration for the peer
is auto-generated and the index from where the hostkey for the peer has to
be copied from. When successfull, this function returs a handle to the
peer which can be used to start and stop it and to obtain the identity of
the peer. If unsuccessful, a NULL pointer is returned with an error
message. This function handles the generated configuration to have
non-conflicting ports and paths.

Peers can be started and stopped by calling the functions
@code{GNUNET_TESTING_peer_start()} and @code{GNUNET_TESTING_peer_stop()}
respectively. A peer can be destroyed by calling the function
@code{GNUNET_TESTING_peer_destroy}. When a peer is destroyed, the ports
and paths in allocated in its configuration are reclaimed for usage in new
peers.

@c ***********************************************************************
@node Finer control over peer stop
@subsection Finer control over peer stop

Using @code{GNUNET_TESTING_peer_stop()} is normally fine for testcases.
However, calling this function for each peer is inefficient when trying to
shutdown multiple peers as this function sends the termination signal to
the given peer process and waits for it to terminate. It would be faster
in this case to send the termination signals to the peers first and then
wait on them. This is accomplished by the functions
@code{GNUNET_TESTING_peer_kill()} which sends a termination signal to the
peer, and the function @code{GNUNET_TESTING_peer_wait()} which waits on
the peer.

Further finer control can be achieved by choosing to stop a peer
asynchronously with the function @code{GNUNET_TESTING_peer_stop_async()}.
This function takes a callback parameter and a closure for it in addition
to the handle to the peer to stop. The callback function is called with
the given closure when the peer is stopped. Using this function
eliminates blocking while waiting for the peer to terminate.

An asynchronous peer stop can be cancelled by calling the function
@code{GNUNET_TESTING_peer_stop_async_cancel()}. Note that calling this
function does not prevent the peer from terminating if the termination
signal has already been sent to it. It does, however, cancels the
callback to be called when the peer is stopped.

@c ***********************************************************************
@node Helper functions
@subsection Helper functions

Most of the testcases can benefit from an abstraction which configures a
peer and starts it. This is provided by the function
@code{GNUNET_TESTING_peer_run()}. This function takes the testing
directory pathname, a configuration template, a callback and its closure.
This function creates a peer in the given testing directory by using the
configuration template, starts the peer and calls the given callback with
the given closure.

The function @code{GNUNET_TESTING_peer_run()} starts the ARM service of
the peer which starts the rest of the configured services. A similar
function @code{GNUNET_TESTING_service_run} can be used to just start a
single service of a peer. In this case, the peer's ARM service is not
started; instead, only the given service is run.

@c ***********************************************************************
@node Testing with multiple processes
@subsection Testing with multiple processes

When testing GNUnet, the splitting of the code into a services and clients
often complicates testing. The solution to this is to have the testcase
fork @code{gnunet-service-arm}, ask it to start the required server and
daemon processes and then execute appropriate client actions (to test the
client APIs or the core module or both). If necessary, multiple ARM
services can be forked using different ports (!) to simulate a network.
However, most of the time only one ARM process is needed. Note that on
exit, the testcase should shutdown ARM with a @code{TERM} signal (to give
it the chance to cleanly stop its child processes).

The following code illustrates spawning and killing an ARM process from a
testcase:

@example
static void run (void *cls,
                 char *const *args,
                 const char *cfgfile,
                 const struct GNUNET_CONFIGURATION_Handle *cfg) @{
  struct GNUNET_OS_Process *arm_pid;
  arm_pid = GNUNET_OS_start_process (NULL,
                                     NULL,
                                     "gnunet-service-arm",
                                     "gnunet-service-arm",
                                     "-c",
                                     cfgname,
                                     NULL);
  /* do real test work here */
  if (0 != GNUNET_OS_process_kill (arm_pid, SIGTERM))
    GNUNET_log_strerror
      (GNUNET_ERROR_TYPE_WARNING, "kill");
  GNUNET_assert (GNUNET_OK == GNUNET_OS_process_wait (arm_pid));
  GNUNET_OS_process_close (arm_pid); @}

GNUNET_PROGRAM_run (argc, argv,
                    "NAME-OF-TEST",
                    "nohelp",
                    options,
                    &run,
                    cls);
@end example


An alternative way that works well to test plugins is to implement a
mock-version of the environment that the plugin expects and then to
simply load the plugin directly.

@c ***********************************************************************
@node Performance regression analysis with Gauger
@section Performance regression analysis with Gauger

To help avoid performance regressions, GNUnet uses Gauger. Gauger is a
simple logging tool that allows remote hosts to send performance data to
a central server, where this data can be analyzed and visualized. Gauger
shows graphs of the repository revisions and the performace data recorded
for each revision, so sudden performance peaks or drops can be identified
and linked to a specific revision number.

In the case of GNUnet, the buildbots log the performance data obtained
during the tests after each build. The data can be accesed on GNUnet's
Gauger page.

The menu on the left allows to select either the results of just one
build bot (under "Hosts") or review the data from all hosts for a given
test result (under "Metrics"). In case of very different absolute value
of the results, for instance arm vs. amd64 machines, the option
"Normalize" on a metric view can help to get an idea about the
performance evolution across all hosts.

Using Gauger in GNUnet and having the performance of a module tracked over
time is very easy. First of course, the testcase must generate some
consistent metric, which makes sense to have logged. Highly volatile or
random dependant metrics probably are not ideal candidates for meaningful
regression detection.

To start logging any value, just include @code{gauger.h} in your testcase
code. Then, use the macro @code{GAUGER()} to make the Buildbots log
whatever value is of interest for you to @code{gnunet.org}'s Gauger
server. No setup is necessary as most Buildbots have already everything
in place and new metrics are created on demand. To delete a metric, you
need to contact a member of the GNUnet development team (a file will need
to be removed manually from the respective directory).

The code in the test should look like this:

@example
[other includes]
#include <gauger.h>

int main (int argc, char *argv[]) @{

  [run test, generate data]
    GAUGER("YOUR_MODULE",
           "METRIC_NAME",
           (float)value,
           "UNIT"); @}
@end example

Where:

@table @asis

@item @strong{YOUR_MODULE} is a category in the gauger page and should be
the name of the module or subsystem like "Core" or "DHT"
@item @strong{METRIC} is
the name of the metric being collected and should be concise and
descriptive, like "PUT operations in sqlite-datastore".
@item @strong{value} is the value
of the metric that is logged for this run.
@item @strong{UNIT} is the unit in
which the value is measured, for instance "kb/s" or "kb of RAM/node".
@end table

If you wish to use Gauger for your own project, you can grab a copy of the
latest stable release or check out Gauger's Subversion repository.

@cindex TESTBED Subsystem
@node TESTBED Subsystem
@section TESTBED Subsystem

The TESTBED subsystem facilitates testing and measuring of multi-peer
deployments on a single host or over multiple hosts.

The architecture of the testbed module is divided into the following:
@itemize @bullet

@item Testbed API: An API which is used by the testing driver programs. It
provides with functions for creating, destroying, starting, stopping
peers, etc.

@item Testbed service (controller): A service which is started through the
Testbed API. This service handles operations to create, destroy, start,
stop peers, connect them, modify their configurations.

@item Testbed helper: When a controller has to be started on a host, the
testbed API starts the testbed helper on that host which in turn starts
the controller. The testbed helper receives a configuration for the
controller through its stdin and changes it to ensure the controller
doesn't run into any port conflict on that host.
@end itemize


The testbed service (controller) is different from the other GNUnet
services in that it is not started by ARM and is not supposed to be run
as a daemon. It is started by the testbed API through a testbed helper.
In a typical scenario involving multiple hosts, a controller is started
on each host. Controllers take up the actual task of creating peers,
starting and stopping them on the hosts they run.

While running deployments on a single localhost the testbed API starts the
testbed helper directly as a child process. When running deployments on
remote hosts the testbed API starts Testbed Helpers on each remote host
through remote shell. By default testbed API uses SSH as a remote shell.
This can be changed by setting the environmental variable
GNUNET_TESTBED_RSH_CMD to the required remote shell program. This
variable can also contain parameters which are to be passed to the remote
shell program. For e.g:

@example
export GNUNET_TESTBED_RSH_CMD="ssh -o BatchMode=yes \
-o NoHostAuthenticationForLocalhost=yes %h"
@end example

Substitutions are allowed in the command string above,
this allows for substitutions through placemarks which begin with a `%'.
At present the following substitutions are supported

@itemize @bullet
@item %h: hostname
@item %u: username
@item %p: port
@end itemize

Note that the substitution placemark is replaced only when the
corresponding field is available and only once. Specifying

@example
%u@atchar{}%h
@end example

doesn't work either. If you want to user username substitutions for
@command{SSH}, use the argument @code{-l} before the
username substitution.

For example:
@example
ssh -l %u -p %p %h
@end example

The testbed API and the helper communicate through the helpers stdin and
stdout. As the helper is started through a remote shell on remote hosts
any output messages from the remote shell interfere with the communication
and results in a failure while starting the helper. For this reason, it is
suggested to use flags to make the remote shells produce no output
messages and to have password-less logins. The default remote shell, SSH,
the default options are:

@example
-o BatchMode=yes -o NoHostBasedAuthenticationForLocalhost=yes"
@end example

Password-less logins should be ensured by using SSH keys.

Since the testbed API executes the remote shell as a non-interactive
shell, certain scripts like .bashrc, .profiler may not be executed. If
this is the case testbed API can be forced to execute an interactive
shell by setting up the environmental variable
@code{GNUNET_TESTBED_RSH_CMD_SUFFIX} to a shell program.

An example could be:

@example
export GNUNET_TESTBED_RSH_CMD_SUFFIX="sh -lc"
@end example

The testbed API will then execute the remote shell program as:

@example
$GNUNET_TESTBED_RSH_CMD -p $port $dest $GNUNET_TESTBED_RSH_CMD_SUFFIX \
gnunet-helper-testbed
@end example

On some systems, problems may arise while starting testbed helpers if
GNUnet is installed into a custom location since the helper may not be
found in the standard path. This can be addressed by setting the variable
`@code{HELPER_BINARY_PATH}' to the path of the testbed helper.
Testbed API will then use this path to start helper binaries both
locally and remotely.

Testbed API can accessed by including the
@file{gnunet_testbed_service.h} file and linking with
@code{-lgnunettestbed}.

@c ***********************************************************************
@menu
* Supported Topologies::
* Hosts file format::
* Topology file format::
* Testbed Barriers::
* Automatic large-scale deployment in the PlanetLab testbed::
* TESTBED Caveats::
@end menu

@node Supported Topologies
@subsection Supported Topologies

While testing multi-peer deployments, it is often needed that the peers
are connected in some topology. This requirement is addressed by the
function @code{GNUNET_TESTBED_overlay_connect()} which connects any given
two peers in the testbed.

The API also provides a helper function
@code{GNUNET_TESTBED_overlay_configure_topology()} to connect a given set
of peers in any of the following supported topologies:

@itemize @bullet

@item @code{GNUNET_TESTBED_TOPOLOGY_CLIQUE}: All peers are connected with
each other

@item @code{GNUNET_TESTBED_TOPOLOGY_LINE}: Peers are connected to form a
line

@item @code{GNUNET_TESTBED_TOPOLOGY_RING}: Peers are connected to form a
ring topology

@item @code{GNUNET_TESTBED_TOPOLOGY_2D_TORUS}: Peers are connected to
form a 2 dimensional torus topology. The number of peers may not be a
perfect square, in that case the resulting torus may not have the uniform
poloidal and toroidal lengths

@item @code{GNUNET_TESTBED_TOPOLOGY_ERDOS_RENYI}: Topology is generated
to form a random graph. The number of links to be present should be given

@item @code{GNUNET_TESTBED_TOPOLOGY_SMALL_WORLD}: Peers are connected to
form a 2D Torus with some random links among them. The number of random
links are to be given

@item @code{GNUNET_TESTBED_TOPOLOGY_SMALL_WORLD_RING}: Peers are
connected to form a ring with some random links among them. The number of
random links are to be given

@item @code{GNUNET_TESTBED_TOPOLOGY_SCALE_FREE}: Connects peers in a
topology where peer connectivity follows power law - new peers are
connected with high probabililty to well connected peers.
@footnote{See Emergence of Scaling in Random Networks. Science 286,
509-512, 1999
(@uref{https://gnunet.org/git/bibliography.git/plain/docs/emergence_of_scaling_in_random_networks__barabasi_albert_science_286__1999.pdf, pdf})}

@item @code{GNUNET_TESTBED_TOPOLOGY_FROM_FILE}: The topology information
is loaded from a file. The path to the file has to be given.
@xref{Topology file format}, for the format of this file.

@item @code{GNUNET_TESTBED_TOPOLOGY_NONE}: No topology
@end itemize


The above supported topologies can be specified respectively by setting
the variable @code{OVERLAY_TOPOLOGY} to the following values in the
configuration passed to Testbed API functions
@code{GNUNET_TESTBED_test_run()} and
@code{GNUNET_TESTBED_run()}:

@itemize @bullet
@item @code{CLIQUE}
@item @code{RING}
@item @code{LINE}
@item @code{2D_TORUS}
@item @code{RANDOM}
@item @code{SMALL_WORLD}
@item @code{SMALL_WORLD_RING}
@item @code{SCALE_FREE}
@item @code{FROM_FILE}
@item @code{NONE}
@end itemize


Topologies @code{RANDOM}, @code{SMALL_WORLD} and @code{SMALL_WORLD_RING}
require the option @code{OVERLAY_RANDOM_LINKS} to be set to the number of
random links to be generated in the configuration. The option will be
ignored for the rest of the topologies.

Topology @code{SCALE_FREE} requires the options
@code{SCALE_FREE_TOPOLOGY_CAP} to be set to the maximum number of peers
which can connect to a peer and @code{SCALE_FREE_TOPOLOGY_M} to be set to
how many peers a peer should be atleast connected to.

Similarly, the topology @code{FROM_FILE} requires the option
@code{OVERLAY_TOPOLOGY_FILE} to contain the path of the file containing
the topology information. This option is ignored for the rest of the
topologies. @xref{Topology file format}, for the format of this file.

@c ***********************************************************************
@node Hosts file format
@subsection Hosts file format

The testbed API offers the function
@code{GNUNET_TESTBED_hosts_load_from_file()} to load from a given file
details about the hosts which testbed can use for deploying peers.
This function is useful to keep the data about hosts
separate instead of hard coding them in code.

Another helper function from testbed API, @code{GNUNET_TESTBED_run()}
also takes a hosts file name as its parameter. It uses the above
function to populate the hosts data structures and start controllers to
deploy peers.

These functions require the hosts file to be of the following format:
@itemize @bullet
@item Each line is interpreted to have details about a host
@item Host details should include the username to use for logging into the
host, the hostname of the host and the port number to use for the remote
shell program. All thee values should be given.
@item These details should be given in the following format:
@example
<username>@@<hostname>:<port>
@end example
@end itemize

Note that having canonical hostnames may cause problems while resolving
the IP addresses (See this bug). Hence it is advised to provide the hosts'
IP numerical addresses as hostnames whenever possible.

@c ***********************************************************************
@node Topology file format
@subsection Topology file format

A topology file describes how peers are to be connected. It should adhere
to the following format for testbed to parse it correctly.

Each line should begin with the target peer id. This should be followed by
a colon(`:') and origin peer ids seperated by `|'. All spaces except for
newline characters are ignored. The API will then try to connect each
origin peer to the target peer.

For example, the following file will result in 5 overlay connections:
[2->1], [3->1],[4->3], [0->3], [2->0]@
@code{@ 1:2|3@ 3:4| 0@ 0: 2@ }

@c ***********************************************************************
@node Testbed Barriers
@subsection Testbed Barriers

The testbed subsystem's barriers API facilitates coordination among the
peers run by the testbed and the experiment driver. The concept is
similar to the barrier synchronisation mechanism found in parallel
programming or multi-threading paradigms - a peer waits at a barrier upon
reaching it until the barrier is reached by a predefined number of peers.
This predefined number of peers required to cross a barrier is also called
quorum. We say a peer has reached a barrier if the peer is waiting for the
barrier to be crossed. Similarly a barrier is said to be reached if the
required quorum of peers reach the barrier. A barrier which is reached is
deemed as crossed after all the peers waiting on it are notified.

The barriers API provides the following functions:
@itemize @bullet
@item @strong{@code{GNUNET_TESTBED_barrier_init()}:} function to
initialse a barrier in the experiment
@item @strong{@code{GNUNET_TESTBED_barrier_cancel()}:} function to cancel
a barrier which has been initialised before
@item @strong{@code{GNUNET_TESTBED_barrier_wait()}:} function to signal
barrier service that the caller has reached a barrier and is waiting for
it to be crossed
@item @strong{@code{GNUNET_TESTBED_barrier_wait_cancel()}:} function to
stop waiting for a barrier to be crossed
@end itemize


Among the above functions, the first two, namely
@code{GNUNET_TESTBED_barrier_init()} and
@code{GNUNET_TESTBED_barrier_cancel()} are used by experiment drivers. All
barriers should be initialised by the experiment driver by calling
@code{GNUNET_TESTBED_barrier_init()}. This function takes a name to
identify the barrier, the quorum required for the barrier to be crossed
and a notification callback for notifying the experiment driver when the
barrier is crossed. @code{GNUNET_TESTBED_barrier_cancel()} cancels an
initialised barrier and frees the resources allocated for it. This
function can be called upon a initialised barrier before it is crossed.

The remaining two functions @code{GNUNET_TESTBED_barrier_wait()} and
@code{GNUNET_TESTBED_barrier_wait_cancel()} are used in the peer's
processes. @code{GNUNET_TESTBED_barrier_wait()} connects to the local
barrier service running on the same host the peer is running on and
registers that the caller has reached the barrier and is waiting for the
barrier to be crossed. Note that this function can only be used by peers
which are started by testbed as this function tries to access the local
barrier service which is part of the testbed controller service. Calling
@code{GNUNET_TESTBED_barrier_wait()} on an uninitialised barrier results
in failure. @code{GNUNET_TESTBED_barrier_wait_cancel()} cancels the
notification registered by @code{GNUNET_TESTBED_barrier_wait()}.


@c ***********************************************************************
@menu
* Implementation::
@end menu

@node Implementation
@subsubsection Implementation

Since barriers involve coordination between experiment driver and peers,
the barrier service in the testbed controller is split into two
components. The first component responds to the message generated by the
barrier API used by the experiment driver (functions
@code{GNUNET_TESTBED_barrier_init()} and
@code{GNUNET_TESTBED_barrier_cancel()}) and the second component to the
messages generated by barrier API used by peers (functions
@code{GNUNET_TESTBED_barrier_wait()} and
@code{GNUNET_TESTBED_barrier_wait_cancel()}).

Calling @code{GNUNET_TESTBED_barrier_init()} sends a
@code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_INIT} message to the master
controller. The master controller then registers a barrier and calls
@code{GNUNET_TESTBED_barrier_init()} for each its subcontrollers. In this
way barrier initialisation is propagated to the controller hierarchy.
While propagating initialisation, any errors at a subcontroller such as
timeout during further propagation are reported up the hierarchy back to
the experiment driver.

Similar to @code{GNUNET_TESTBED_barrier_init()},
@code{GNUNET_TESTBED_barrier_cancel()} propagates
@code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_CANCEL} message which causes
controllers to remove an initialised barrier.

The second component is implemented as a separate service in the binary
`gnunet-service-testbed' which already has the testbed controller service.
Although this deviates from the gnunet process architecture of having one
service per binary, it is needed in this case as this component needs
access to barrier data created by the first component. This component
responds to @code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_WAIT} messages from
local peers when they call @code{GNUNET_TESTBED_barrier_wait()}. Upon
receiving @code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_WAIT} message, the
service checks if the requested barrier has been initialised before and
if it was not initialised, an error status is sent through
@code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS} message to the local
peer and the connection from the peer is terminated. If the barrier is
initialised before, the barrier's counter for reached peers is incremented
and a notification is registered to notify the peer when the barrier is
reached. The connection from the peer is left open.

When enough peers required to attain the quorum send
@code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_WAIT} messages, the controller
sends a @code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS} message to its
parent informing that the barrier is crossed. If the controller has
started further subcontrollers, it delays this message until it receives
a similar notification from each of those subcontrollers. Finally, the
barriers API at the experiment driver receives the
@code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS} when the barrier is
reached at all the controllers.

The barriers API at the experiment driver responds to the
@code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS} message by echoing it
back to the master controller and notifying the experiment controller
through the notification callback that a barrier has been crossed. The
echoed @code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS} message is
propagated by the master controller to the controller hierarchy. This
propagation triggers the notifications registered by peers at each of the
controllers in the hierarchy. Note the difference between this downward
propagation of the @code{GNUNET_MESSAGE_TYPE_TESTBED_BARRIER_STATUS}
message from its upward propagation --- the upward propagation is needed
for ensuring that the barrier is reached by all the controllers and the
downward propagation is for triggering that the barrier is crossed.

@cindex PlanetLab testbed
@node Automatic large-scale deployment in the PlanetLab testbed
@subsection Automatic large-scale deployment in the PlanetLab testbed

PlanetLab is a testbed for computer networking and distributed systems
research. It was established in 2002 and as of June 2010 was composed of
1090 nodes at 507 sites worldwide.

To automate the GNUnet we created a set of automation tools to simplify
the large-scale deployment. We provide you a set of scripts you can use
to deploy GNUnet on a set of nodes and manage your installation.

Please also check @uref{https://gnunet.org/installation-fedora8-svn} and
@uref{https://gnunet.org/installation-fedora12-svn} to find detailled
instructions how to install GNUnet on a PlanetLab node.


@c ***********************************************************************
@menu
* PlanetLab Automation for Fedora8 nodes::
* Install buildslave on PlanetLab nodes running fedora core 8::
* Setup a new PlanetLab testbed using GPLMT::
* Why do i get an ssh error when using the regex profiler?::
@end menu

@node PlanetLab Automation for Fedora8 nodes
@subsubsection PlanetLab Automation for Fedora8 nodes

@c ***********************************************************************
@node Install buildslave on PlanetLab nodes running fedora core 8
@subsubsection Install buildslave on PlanetLab nodes running fedora core 8
@c ** Actually this is a subsubsubsection, but must be fixed differently
@c ** as subsubsection is the lowest.

Since most of the PlanetLab nodes are running the very old Fedora core 8
image, installing the buildslave software is quite some pain. For our
PlanetLab testbed we figured out how to install the buildslave software
best.

@c This is a vvery terrible way to suggest installing software.
@c FIXME: Is there an official, safer way instead of blind-piping a
@c script?
@c FIXME: Use newer pypi URLs below.
Install Distribute for Python:

@example
curl http://python-distribute.org/distribute_setup.py | sudo python
@end example

Install Distribute for zope.interface <= 3.8.0 (4.0 and 4.0.1 will not
work):

@example
export PYPI=@value{PYPI-URL}
wget $PYPI/z/zope.interface/zope.interface-3.8.0.tar.gz
tar zvfz zope.interface-3.8.0.tar.gz
cd zope.interface-3.8.0
sudo python setup.py install
@end example

Install the buildslave software (0.8.6 was the latest version):

@example
export GCODE="http://buildbot.googlecode.com/files"
wget $GCODE/buildbot-slave-0.8.6p1.tar.gz
tar xvfz buildbot-slave-0.8.6p1.tar.gz
cd buildslave-0.8.6p1
sudo python setup.py install
@end example

The setup will download the matching twisted package and install it.
It will also try to install the latest version of zope.interface which
will fail to install. Buildslave will work anyway since version 3.8.0
was installed before!

@c ***********************************************************************
@node Setup a new PlanetLab testbed using GPLMT
@subsubsection Setup a new PlanetLab testbed using GPLMT

@itemize @bullet
@item Get a new slice and assign nodes
Ask your PlanetLab PI to give you a new slice and assign the nodes you
need
@item Install a buildmaster
You can stick to the buildbot documentation:@
@uref{http://buildbot.net/buildbot/docs/current/manual/installation.html}
@item Install the buildslave software on all nodes
To install the buildslave on all nodes assigned to your slice you can use
the tasklist @code{install_buildslave_fc8.xml} provided with GPLMT:

@example
./gplmt.py -c contrib/tumple_gnunet.conf -t \
contrib/tasklists/install_buildslave_fc8.xml -a -p <planetlab password>
@end example

@item Create the buildmaster configuration and the slave setup commands

The master and the and the slaves have need to have credentials and the
master has to have all nodes configured. This can be done with the
@file{create_buildbot_configuration.py} script in the @file{scripts}
directory.

This scripts takes a list of nodes retrieved directly from PlanetLab or
read from a file and a configuration template and creates:

@itemize @bullet
@item a tasklist which can be executed with gplmt to setup the slaves
@item a master.cfg file containing a PlanetLab nodes
@end itemize

A configuration template is included in the <contrib>, most important is
that the script replaces the following tags in the template:

%GPLMT_BUILDER_DEFINITION :@ GPLMT_BUILDER_SUMMARY@ GPLMT_SLAVES@
%GPLMT_SCHEDULER_BUILDERS

Create configuration for all nodes assigned to a slice:

@example
./create_buildbot_configuration.py -u <planetlab username> \
-p <planetlab password> -s <slice> -m <buildmaster+port> \
-t <template>
@end example

Create configuration for some nodes in a file:

@example
./create_buildbot_configuration.p -f <node_file> \
-m <buildmaster+port> -t <template>
@end example

@item Copy the @file{master.cfg} to the buildmaster and start it
Use @code{buildbot start <basedir>} to start the server
@item Setup the buildslaves
@end itemize

@c ***********************************************************************
@node Why do i get an ssh error when using the regex profiler?
@subsubsection Why do i get an ssh error when using the regex profiler?

Why do i get an ssh error "Permission denied (publickey,password)." when
using the regex profiler although passwordless ssh to localhost works
using publickey and ssh-agent?

You have to generate a public/private-key pair with no password:@
@code{ssh-keygen -t rsa -b 4096 -f ~/.ssh/id_localhost}@
and then add the following to your ~/.ssh/config file:

@code{Host 127.0.0.1@ IdentityFile ~/.ssh/id_localhost}

now make sure your hostsfile looks like

@example
[USERNAME]@@127.0.0.1:22@
[USERNAME]@@127.0.0.1:22
@end example

You can test your setup by running @code{ssh 127.0.0.1} in a
terminal and then in the opened session run it again.
If you were not asked for a password on either login,
then you should be good to go.

@cindex TESTBED Caveats
@node TESTBED Caveats
@subsection TESTBED Caveats

This section documents a few caveats when using the GNUnet testbed
subsystem.

@c ***********************************************************************
@menu
* CORE must be started::
* ATS must want the connections::
@end menu

@node CORE must be started
@subsubsection CORE must be started

A simple issue is #3993: Your configuration MUST somehow ensure that for
each peer the CORE service is started when the peer is setup, otherwise
TESTBED may fail to connect peers when the topology is initialized, as
TESTBED will start some CORE services but not necessarily all (but it
relies on all of them running). The easiest way is to set
'FORCESTART = YES' in the '[core]' section of the configuration file.
Alternatively, having any service that directly or indirectly depends on
CORE being started with FORCESTART will also do. This issue largely arises
if users try to over-optimize by not starting any services with
FORCESTART.

@c ***********************************************************************
@node ATS must want the connections
@subsubsection ATS must want the connections

When TESTBED sets up connections, it only offers the respective HELLO
information to the TRANSPORT service. It is then up to the ATS service to
@strong{decide} to use the connection. The ATS service will typically
eagerly establish any connection if the number of total connections is
low (relative to bandwidth). Details may further depend on the
specific ATS backend that was configured. If ATS decides to NOT establish
a connection (even though TESTBED provided the required information), then
that connection will count as failed for TESTBED. Note that you can
configure TESTBED to tolerate a certain number of connection failures
(see '-e' option of gnunet-testbed-profiler). This issue largely arises
for dense overlay topologies, especially if you try to create cliques
with more than 20 peers.

@cindex libgnunetutil
@node libgnunetutil
@section libgnunetutil

libgnunetutil is the fundamental library that all GNUnet code builds upon.
Ideally, this library should contain most of the platform dependent code
(except for user interfaces and really special needs that only few
applications have). It is also supposed to offer basic services that most
if not all GNUnet binaries require. The code of libgnunetutil is in the
@file{src/util/} directory. The public interface to the library is in the
gnunet_util.h header. The functions provided by libgnunetutil fall
roughly into the following categories (in roughly the order of importance
for new developers):

@itemize @bullet
@item logging (common_logging.c)
@item memory allocation (common_allocation.c)
@item endianess conversion (common_endian.c)
@item internationalization (common_gettext.c)
@item String manipulation (string.c)
@item file access (disk.c)
@item buffered disk IO (bio.c)
@item time manipulation (time.c)
@item configuration parsing (configuration.c)
@item command-line handling (getopt*.c)
@item cryptography (crypto_*.c)
@item data structures (container_*.c)
@item CPS-style scheduling (scheduler.c)
@item Program initialization (program.c)
@item Networking (network.c, client.c, server*.c, service.c)
@item message queueing (mq.c)
@item bandwidth calculations (bandwidth.c)
@item Other OS-related (os*.c, plugin.c, signal.c)
@item Pseudonym management (pseudonym.c)
@end itemize

It should be noted that only developers that fully understand this entire
API will be able to write good GNUnet code.

Ideally, porting GNUnet should only require porting the gnunetutil
library. More testcases for the gnunetutil APIs are therefore a great
way to make porting of GNUnet easier.

@menu
* Logging::
* Interprocess communication API (IPC)::
* Cryptography API::
* Message Queue API::
* Service API::
* Optimizing Memory Consumption of GNUnet's (Multi-) Hash Maps::
* CONTAINER_MDLL API::
@end menu

@cindex Logging
@cindex log levels
@node Logging
@subsection Logging

GNUnet is able to log its activity, mostly for the purposes of debugging
the program at various levels.

@file{gnunet_common.h} defines several @strong{log levels}:
@table @asis

@item ERROR for errors (really problematic situations, often leading to
crashes)
@item WARNING for warnings (troubling situations that might have
negative consequences, although not fatal)
@item INFO for various information.
Used somewhat rarely, as GNUnet statistics is used to hold and display
most of the information that users might find interesting.
@item DEBUG for debugging.
Does not produce much output on normal builds, but when extra logging is
enabled at compile time, a staggering amount of data is outputted under
this log level.
@end table


Normal builds of GNUnet (configured with @code{--enable-logging[=yes]})
are supposed to log nothing under DEBUG level. The
@code{--enable-logging=verbose} configure option can be used to create a
build with all logging enabled. However, such build will produce large
amounts of log data, which is inconvenient when one tries to hunt down a
specific problem.

To mitigate this problem, GNUnet provides facilities to apply a filter to
reduce the logs:
@table @asis

@item Logging by default When no log levels are configured in any other
way (see below), GNUnet will default to the WARNING log level. This
mostly applies to GNUnet command line utilities, services and daemons;
tests will always set log level to WARNING or, if
@code{--enable-logging=verbose} was passed to configure, to DEBUG. The
default level is suggested for normal operation.
@item The -L option Most GNUnet executables accept an "-L loglevel" or
"--log=loglevel" option. If used, it makes the process set a global log
level to "loglevel". Thus it is possible to run some processes
with -L DEBUG, for example, and others with -L ERROR to enable specific
settings to diagnose problems with a particular process.
@item Configuration files.  Because GNUnet
service and deamon processes are usually launched by gnunet-arm, it is not
possible to pass different custom command line options directly to every
one of them. The options passed to @code{gnunet-arm} only affect
gnunet-arm and not the rest of GNUnet. However, one can specify a
configuration key "OPTIONS" in the section that corresponds to a service
or a daemon, and put a value of "-L loglevel" there. This will make the
respective service or daemon set its log level to "loglevel" (as the
value of OPTIONS will be passed as a command-line argument).

To specify the same log level for all services without creating separate
"OPTIONS" entries in the configuration for each one, the user can specify
a config key "GLOBAL_POSTFIX" in the [arm] section of the configuration
file. The value of GLOBAL_POSTFIX will be appended to all command lines
used by the ARM service to run other services. It can contain any option
valid for all GNUnet commands, thus in particular the "-L loglevel"
option. The ARM service itself is, however, unaffected by GLOBAL_POSTFIX;
to set log level for it, one has to specify "OPTIONS" key in the [arm]
section.
@item Environment variables.
Setting global per-process log levels with "-L loglevel" does not offer
sufficient log filtering granularity, as one service will call interface
libraries and supporting libraries of other GNUnet services, potentially
producing lots of debug log messages from these libraries. Also, changing
the config file is not always convenient (especially when running the
GNUnet test suite).@ To fix that, and to allow GNUnet to use different
log filtering at runtime without re-compiling the whole source tree, the
log calls were changed to be configurable at run time. To configure them
one has to define environment variables "GNUNET_FORCE_LOGFILE",
"GNUNET_LOG" and/or "GNUNET_FORCE_LOG":
@itemize @bullet

@item "GNUNET_LOG" only affects the logging when no global log level is
configured by any other means (that is, the process does not explicitly
set its own log level, there are no "-L loglevel" options on command line
or in configuration files), and can be used to override the default
WARNING log level.

@item "GNUNET_FORCE_LOG" will completely override any other log
configuration options given.

@item "GNUNET_FORCE_LOGFILE" will completely override the location of the
file to log messages to. It should contain a relative or absolute file
name. Setting GNUNET_FORCE_LOGFILE is equivalent to passing
"--log-file=logfile" or "-l logfile" option (see below). It supports "[]"
format in file names, but not "@{@}" (see below).
@end itemize


Because environment variables are inherited by child processes when they
are launched, starting or re-starting the ARM service with these
variables will propagate them to all other services.

"GNUNET_LOG" and "GNUNET_FORCE_LOG" variables must contain a specially
formatted @strong{logging definition} string, which looks like this:@

@c FIXME: Can we close this with [/component] instead?
@example
[component];[file];[function];[from_line[-to_line]];loglevel[/component...]
@end example

That is, a logging definition consists of definition entries, separated by
slashes ('/'). If only one entry is present, there is no need to add a
slash to its end (although it is not forbidden either).@ All definition
fields (component, file, function, lines and loglevel) are mandatory, but
(except for the loglevel) they can be empty. An empty field means
"match anything". Note that even if fields are empty, the semicolon (';')
separators must be present.@ The loglevel field is mandatory, and must
contain one of the log level names (ERROR, WARNING, INFO or DEBUG).@
The lines field might contain one non-negative number, in which case it
matches only one line, or a range "from_line-to_line", in which case it
matches any line in the interval [from_line;to_line] (that is, including
both start and end line).@ GNUnet mostly defaults component name to the
name of the service that is implemented in a process ('transport',
'core', 'peerinfo', etc), but logging calls can specify custom component
names using @code{GNUNET_log_from}.@ File name and function name are
provided by the compiler (__FILE__ and __FUNCTION__ built-ins).

Component, file and function fields are interpreted as non-extended
regular expressions (GNU libc regex functions are used). Matching is
case-sensitive, "^" and "$" will match the beginning and the end of the
text. If a field is empty, its contents are automatically replaced with
a ".*" regular expression, which matches anything. Matching is done in
the default way, which means that the expression matches as long as it's
contained anywhere in the string. Thus "GNUNET_" will match both
"GNUNET_foo" and "BAR_GNUNET_BAZ". Use '^' and/or '$' to make sure that
the expression matches at the start and/or at the end of the string.
The semicolon (';') can't be escaped, and GNUnet will not use it in
component names (it can't be used in function names and file names
anyway).

@end table


Every logging call in GNUnet code will be (at run time) matched against
the log definitions passed to the process. If a log definition fields are
matching the call arguments, then the call log level is compared the the
log level of that definition. If the call log level is less or equal to
the definition log level, the call is allowed to proceed. Otherwise the
logging call is forbidden, and nothing is logged. If no definitions
matched at all, GNUnet will use the global log level or (if a global log
level is not specified) will default to WARNING (that is, it will allow
the call to proceed, if its level is less or equal to the global log
level or to WARNING).

That is, definitions are evaluated from left to right, and the first
matching definition is used to allow or deny the logging call. Thus it is
advised to place narrow definitions at the beginning of the logdef
string, and generic definitions - at the end.

Whether a call is allowed or not is only decided the first time this
particular call is made. The evaluation result is then cached, so that
any attempts to make the same call later will be allowed or disallowed
right away. Because of that runtime log level evaluation should not
significantly affect the process performance.
Log definition parsing is only done once, at the first call to
GNUNET_log_setup () made by the process (which is usually done soon after
it starts).

At the moment of writing there is no way to specify logging definitions
from configuration files, only via environment variables.

At the moment GNUnet will stop processing a log definition when it
encounters an error in definition formatting or an error in regular
expression syntax, and will not report the failure in any way.


@c ***********************************************************************
@menu
* Examples::
* Log files::
* Updated behavior of GNUNET_log::
@end menu

@node Examples
@subsubsection Examples

@table @asis

@item @code{GNUNET_FORCE_LOG=";;;;DEBUG" gnunet-arm -s} Start GNUnet
process tree, running all processes with DEBUG level (one should be
careful with it, as log files will grow at alarming rate!)
@item @code{GNUNET_FORCE_LOG="core;;;;DEBUG" gnunet-arm -s} Start GNUnet
process tree, running the core service under DEBUG level (everything else
will use configured or default level).

@item Start GNUnet process tree, allowing any logging calls from
gnunet-service-transport_validation.c (everything else will use
configured or default level).

@example
GNUNET_FORCE_LOG=";gnunet-service-transport_validation.c;;; DEBUG" \
gnunet-arm -s
@end example

@item Start GNUnet process tree, allowing any logging calls from
gnunet-gnunet-service-fs_push.c (everything else will use configured or
default level).

@example
GNUNET_FORCE_LOG="fs;gnunet-service-fs_push.c;;;DEBUG" gnunet-arm -s
@end example

@item Start GNUnet process tree, allowing any logging calls from the
GNUNET_NETWORK_socket_select function (everything else will use
configured or default level).

@example
GNUNET_FORCE_LOG=";;GNUNET_NETWORK_socket_select;;DEBUG" gnunet-arm -s
@end example

@item Start GNUnet process tree, allowing any logging calls from the
components that have "transport" in their names, and are made from
function that have "send" in their names. Everything else will be allowed
to be logged only if it has WARNING level.

@example
GNUNET_FORCE_LOG="transport.*;;.*send.*;;DEBUG/;;;;WARNING" gnunet-arm -s
@end example

@end table


On Windows, one can use batch files to run GNUnet processes with special
environment variables, without affecting the whole system. Such batch
file will look like this:

@example
set GNUNET_FORCE_LOG=;;do_transmit;;DEBUG@ gnunet-arm -s
@end example

(note the absence of double quotes in the environment variable definition,
as opposed to earlier examples, which use the shell).
Another limitation, on Windows, GNUNET_FORCE_LOGFILE @strong{MUST} be set
in order to GNUNET_FORCE_LOG to work.


@cindex Log files
@node Log files
@subsubsection Log files

GNUnet can be told to log everything into a file instead of stderr (which
is the default) using the "--log-file=logfile" or "-l logfile" option.
This option can also be passed via command line, or from the "OPTION" and
"GLOBAL_POSTFIX" configuration keys (see above). The file name passed
with this option is subject to GNUnet filename expansion. If specified in
"GLOBAL_POSTFIX", it is also subject to ARM service filename expansion,
in particular, it may contain "@{@}" (left and right curly brace)
sequence, which will be replaced by ARM with the name of the service.
This is used to keep logs from more than one service separate, while only
specifying one template containing "@{@}" in GLOBAL_POSTFIX.

As part of a secondary file name expansion, the first occurrence of "[]"
sequence ("left square brace" followed by "right square brace") in the
file name will be replaced with a process identifier or the process when
it initializes its logging subsystem. As a result, all processes will log
into different files. This is convenient for isolating messages of a
particular process, and prevents I/O races when multiple processes try to
write into the file at the same time. This expansion is done
independently of "@{@}" expansion that ARM service does (see above).

The log file name that is specified via "-l" can contain format characters
from the 'strftime' function family. For example, "%Y" will be replaced
with the current year. Using "basename-%Y-%m-%d.log" would include the
current year, month and day in the log file. If a GNUnet process runs for
long enough to need more than one log file, it will eventually clean up
old log files. Currently, only the last three log files (plus the current
log file) are preserved. So once the fifth log file goes into use (so
after 4 days if you use "%Y-%m-%d" as above), the first log file will be
automatically deleted. Note that if your log file name only contains "%Y",
then log files would be kept for 4 years and the logs from the first year
would be deleted once year 5 begins. If you do not use any date-related
string format codes, logs would never be automatically deleted by GNUnet.


@c ***********************************************************************

@node Updated behavior of GNUNET_log
@subsubsection Updated behavior of GNUNET_log

It's currently quite common to see constructions like this all over the
code:

@example
#if MESH_DEBUG
GNUNET_log (GNUNET_ERROR_TYPE_DEBUG, "MESH: client disconnected\n");
#endif
@end example

The reason for the #if is not to avoid displaying the message when
disabled (GNUNET_ERROR_TYPE takes care of that), but to avoid the
compiler including it in the binary at all, when compiling GNUnet for
platforms with restricted storage space / memory (MIPS routers,
ARM plug computers / dev boards, etc).

This presents several problems: the code gets ugly, hard to write and it
is very easy to forget to include the #if guards, creating non-consistent
code. A new change in GNUNET_log aims to solve these problems.

@strong{This change requires to @file{./configure} with at least
@code{--enable-logging=verbose} to see debug messages.}

Here is an example of code with dense debug statements:

@example
switch (restrict_topology) @{
case GNUNET_TESTING_TOPOLOGY_CLIQUE:#if VERBOSE_TESTING
GNUNET_log (GNUNET_ERROR_TYPE_DEBUG, _("Blacklisting all but clique
topology\n")); #endif unblacklisted_connections = create_clique (pg,
&remove_connections, BLACKLIST, GNUNET_NO); break; case
GNUNET_TESTING_TOPOLOGY_SMALL_WORLD_RING: #if VERBOSE_TESTING GNUNET_log
(GNUNET_ERROR_TYPE_DEBUG, _("Blacklisting all but small world (ring)
topology\n")); #endif unblacklisted_connections = create_small_world_ring
(pg,&remove_connections, BLACKLIST); break;
@end example


Pretty hard to follow, huh?

From now on, it is not necessary to include the #if / #endif statements to
achieve the same behavior. The GNUNET_log and GNUNET_log_from macros take
care of it for you, depending on the configure option:

@itemize @bullet
@item If @code{--enable-logging} is set to @code{no}, the binary will
contain no log messages at all.
@item If @code{--enable-logging} is set to @code{yes}, the binary will
contain no DEBUG messages, and therefore running with -L DEBUG will have
no effect. Other messages (ERROR, WARNING, INFO, etc) will be included.
@item If @code{--enable-logging} is set to @code{verbose}, or
@code{veryverbose} the binary will contain DEBUG messages (still, it will
be neccessary to run with -L DEBUG or set the DEBUG config option to show
them).
@end itemize


If you are a developer:
@itemize @bullet
@item please make sure that you @code{./configure
--enable-logging=@{verbose,veryverbose@}}, so you can see DEBUG messages.
@item please remove the @code{#if} statements around @code{GNUNET_log
(GNUNET_ERROR_TYPE_DEBUG, ...)} lines, to improve the readibility of your
code.
@end itemize

Since now activating DEBUG automatically makes it VERBOSE and activates
@strong{all} debug messages by default, you probably want to use the
https://gnunet.org/logging functionality to filter only relevant messages.
A suitable configuration could be:

@example
$ export GNUNET_FORCE_LOG="^YOUR_SUBSYSTEM$;;;;DEBUG/;;;;WARNING"
@end example

Which will behave almost like enabling DEBUG in that subsytem before the
change. Of course you can adapt it to your particular needs, this is only
a quick example.

@cindex Interprocess communication API
@cindex ICP
@node Interprocess communication API (IPC)
@subsection Interprocess communication API (IPC)

In GNUnet a variety of new message types might be defined and used in
interprocess communication, in this tutorial we use the
@code{struct AddressLookupMessage} as a example to introduce how to
construct our own message type in GNUnet and how to implement the message
communication between service and client.
(Here, a client uses the @code{struct AddressLookupMessage} as a request
to ask the server to return the address of any other peer connecting to
the service.)


@c ***********************************************************************
@menu
* Define new message types::
* Define message struct::
* Client - Establish connection::
* Client - Initialize request message::
* Client - Send request and receive response::
* Server - Startup service::
* Server - Add new handles for specified messages::
* Server - Process request message::
* Server - Response to client::
* Server - Notification of clients::
* Conversion between Network Byte Order (Big Endian) and Host Byte Order::
@end menu

@node Define new message types
@subsubsection Define new message types

First of all, you should define the new message type in
@file{gnunet_protocols.h}:

@example
 // Request to look addresses of peers in server.
#define GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_LOOKUP 29
  // Response to the address lookup request.
#define GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_REPLY 30
@end example

@c ***********************************************************************
@node Define message struct
@subsubsection Define message struct

After the type definition, the specified message structure should also be
described in the header file, e.g. transport.h in our case.

@example
struct AddressLookupMessage @{
  struct GNUNET_MessageHeader header;
  int32_t numeric_only GNUNET_PACKED;
  struct GNUNET_TIME_AbsoluteNBO timeout;
  uint32_t addrlen GNUNET_PACKED;
  /* followed by 'addrlen' bytes of the actual address, then
     followed by the 0-terminated name of the transport */ @};
GNUNET_NETWORK_STRUCT_END
@end example


Please note @code{GNUNET_NETWORK_STRUCT_BEGIN} and @code{GNUNET_PACKED}
which both ensure correct alignment when sending structs over the network.

@menu
@end menu

@c ***********************************************************************
@node Client - Establish connection
@subsubsection Client - Establish connection
@c %**end of header


At first, on the client side, the underlying API is employed to create a
new connection to a service, in our example the transport service would be
connected.

@example
struct GNUNET_CLIENT_Connection *client;
client = GNUNET_CLIENT_connect ("transport", cfg);
@end example

@c ***********************************************************************
@node Client - Initialize request message
@subsubsection Client - Initialize request message
@c %**end of header

When the connection is ready, we initialize the message. In this step,
all the fields of the message should be properly initialized, namely the
size, type, and some extra user-defined data, such as timeout, name of
transport, address and name of transport.

@example
struct AddressLookupMessage *msg;
size_t len = sizeof (struct AddressLookupMessage)
  + addressLen
  + strlen (nameTrans)
  + 1;
msg->header->size = htons (len);
msg->header->type = htons
(GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_LOOKUP);
msg->timeout = GNUNET_TIME_absolute_hton (abs_timeout);
msg->addrlen = htonl (addressLen);
char *addrbuf = (char *) &msg[1];
memcpy (addrbuf, address, addressLen);
char *tbuf = &addrbuf[addressLen];
memcpy (tbuf, nameTrans, strlen (nameTrans) + 1);
@end example

Note that, here the functions @code{htonl}, @code{htons} and
@code{GNUNET_TIME_absolute_hton} are applied to convert little endian
into big endian, about the usage of the big/small edian order and the
corresponding conversion function please refer to Introduction of
Big Endian and Little Endian.

@c ***********************************************************************
@node Client - Send request and receive response
@subsubsection Client - Send request and receive response
@c %**end of header

@b{FIXME: This is very outdated, see the tutorial for the current API!}

Next, the client would send the constructed message as a request to the
service and wait for the response from the service. To accomplish this
goal, there are a number of API calls that can be used. In this example,
@code{GNUNET_CLIENT_transmit_and_get_response} is chosen as the most
appropriate function to use.

@example
GNUNET_CLIENT_transmit_and_get_response
(client, msg->header, timeout, GNUNET_YES, &address_response_processor,
arp_ctx);
@end example

the argument @code{address_response_processor} is a function with
@code{GNUNET_CLIENT_MessageHandler} type, which is used to process the
reply message from the service.

@node Server - Startup service
@subsubsection Server - Startup service

After receiving the request message, we run a standard GNUnet service
startup sequence using @code{GNUNET_SERVICE_run}, as follows,

@example
int main(int argc, char**argv) @{
  GNUNET_SERVICE_run(argc, argv, "transport"
  GNUNET_SERVICE_OPTION_NONE, &run, NULL)); @}
@end example

@c ***********************************************************************
@node Server - Add new handles for specified messages
@subsubsection Server - Add new handles for specified messages
@c %**end of header

in the function above the argument @code{run} is used to initiate
transport service,and defined like this:

@example
static void run (void *cls,
struct GNUNET_SERVER_Handle *serv,
const struct GNUNET_CONFIGURATION_Handle *cfg) @{
  GNUNET_SERVER_add_handlers (serv, handlers); @}
@end example


Here, @code{GNUNET_SERVER_add_handlers} must be called in the run
function to add new handlers in the service. The parameter
@code{handlers} is a list of @code{struct GNUNET_SERVER_MessageHandler}
to tell the service which function should be called when a particular
type of message is received, and should be defined in this way:

@example
static struct GNUNET_SERVER_MessageHandler handlers[] = @{
  @{&handle_start,
   NULL,
   GNUNET_MESSAGE_TYPE_TRANSPORT_START,
   0@},
  @{&handle_send,
   NULL,
   GNUNET_MESSAGE_TYPE_TRANSPORT_SEND,
   0@},
  @{&handle_try_connect,
   NULL,
   GNUNET_MESSAGE_TYPE_TRANSPORT_TRY_CONNECT,
   sizeof (struct TryConnectMessage)
  @},
  @{&handle_address_lookup,
   NULL,
   GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_LOOKUP,
   0@},
  @{NULL,
   NULL,
   0,
   0@}
@};
@end example


As shown, the first member of the struct in the first area is a callback
function, which is called to process the specified message types, given
as the third member. The second parameter is the closure for the callback
function, which is set to @code{NULL} in most cases, and the last
parameter is the expected size of the message of this type, usually we
set it to 0 to accept variable size, for special cases the exact size of
the specified message also can be set. In addition, the terminator sign
depicted as @code{@{NULL, NULL, 0, 0@}} is set in the last aera.

@c ***********************************************************************
@node Server - Process request message
@subsubsection Server - Process request message
@c %**end of header

After the initialization of transport service, the request message would
be processed. Before handling the main message data, the validity of this
message should be checked out, e.g., to check whether the size of message
is correct.

@example
size = ntohs (message->size);
if (size < sizeof (struct AddressLookupMessage)) @{
  GNUNET_break_op (0);
  GNUNET_SERVER_receive_done (client, GNUNET_SYSERR);
  return; @}
@end example


Note that, opposite to the construction method of the request message in
the client, in the server the function @code{nothl} and @code{ntohs}
should be employed during the extraction of the data from the message, so
that the data in big endian order can be converted back into little
endian order. See more in detail please refer to Introduction of
Big Endian and Little Endian.

Moreover in this example, the name of the transport stored in the message
is a 0-terminated string, so we should also check whether the name of the
transport in the received message is 0-terminated:

@example
nameTransport = (const char *) &address[addressLen];
if (nameTransport[size - sizeof
                  (struct AddressLookupMessage)
                  - addressLen - 1] != '\0') @{
  GNUNET_break_op (0);
  GNUNET_SERVER_receive_done (client,
                              GNUNET_SYSERR);
  return; @}
@end example

Here, @code{GNUNET_SERVER_receive_done} should be called to tell the
service that the request is done and can receive the next message. The
argument @code{GNUNET_SYSERR} here indicates that the service didn't
understand the request message, and the processing of this request would
be terminated.

In comparison to the aforementioned situation, when the argument is equal
to @code{GNUNET_OK}, the service would continue to process the requst
message.

@c ***********************************************************************
@node Server - Response to client
@subsubsection Server - Response to client
@c %**end of header

Once the processing of current request is done, the server should give the
response to the client. A new @code{struct AddressLookupMessage} would be
produced by the server in a similar way as the client did and sent to the
client, but here the type should be
@code{GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_REPLY} rather than
@code{GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_LOOKUP} in client.
@example
struct AddressLookupMessage *msg;
size_t len = sizeof (struct AddressLookupMessage)
  + addressLen
  + strlen (nameTrans) + 1;
msg->header->size = htons (len);
msg->header->type = htons
  (GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_REPLY);

// ...

struct GNUNET_SERVER_TransmitContext *tc;
tc = GNUNET_SERVER_transmit_context_create (client);
GNUNET_SERVER_transmit_context_append_data
(tc,
 NULL,
 0,
 GNUNET_MESSAGE_TYPE_TRANSPORT_ADDRESS_REPLY);
GNUNET_SERVER_transmit_context_run (tc, rtimeout);
@end example


Note that, there are also a number of other APIs provided to the service
to send the message.

@c ***********************************************************************
@node Server - Notification of clients
@subsubsection Server - Notification of clients
@c %**end of header

Often a service needs to (repeatedly) transmit notifications to a client
or a group of clients. In these cases, the client typically has once
registered for a set of events and then needs to receive a message
whenever such an event happens (until the client disconnects). The use of
a notification context can help manage message queues to clients and
handle disconnects. Notification contexts can be used to send
individualized messages to a particular client or to broadcast messages
to a group of clients. An individualized notification might look like
this:

@example
GNUNET_SERVER_notification_context_unicast(nc,
                                           client,
                                           msg,
                                           GNUNET_YES);
@end example


Note that after processing the original registration message for
notifications, the server code still typically needs to call
@code{GNUNET_SERVER_receive_done} so that the client can transmit further
messages to the server.

@c ***********************************************************************
@node Conversion between Network Byte Order (Big Endian) and Host Byte Order
@subsubsection Conversion between Network Byte Order (Big Endian) and Host Byte Order
@c %** subsub? it's a referenced page on the ipc document.
@c %**end of header

Here we can simply comprehend big endian and little endian as Network Byte
Order and Host Byte Order respectively. What is the difference between
both two?

Usually in our host computer we store the data byte as Host Byte Order,
for example, we store a integer in the RAM which might occupies 4 Byte,
as Host Byte Order the higher Byte would be stored at the lower address
of RAM, and the lower Byte would be stored at the higher address of RAM.
However, contrast to this, Network Byte Order just take the totally
opposite way to store the data, says, it will store the lower Byte at the
lower address, and the higher Byte will stay at higher address.

For the current communication of network, we normally exchange the
information by surveying the data package, every two host wants to
communicate with each other must send and receive data package through
network. In order to maintain the identity of data through the
transmission in the network, the order of the Byte storage must changed
before sending and after receiving the data.

There ten convenient functions to realize the conversion of Byte Order in
GNUnet, as following:

@table @asis

@item uint16_t htons(uint16_t hostshort) Convert host byte order to net
byte order with short int
@item uint32_t htonl(uint32_t hostlong) Convert host byte
order to net byte order with long int
@item uint16_t ntohs(uint16_t netshort)
Convert net byte order to host byte order with short int
@item uint32_t
ntohl(uint32_t netlong) Convert net byte order to host byte order with
long int
@item unsigned long long GNUNET_ntohll (unsigned long long netlonglong)
Convert net byte order to host byte order with long long int
@item unsigned long long GNUNET_htonll (unsigned long long hostlonglong)
Convert host byte order to net byte order with long long int
@item struct GNUNET_TIME_RelativeNBO GNUNET_TIME_relative_hton
(struct GNUNET_TIME_Relative a) Convert relative time to network byte
order.
@item struct GNUNET_TIME_Relative GNUNET_TIME_relative_ntoh
(struct GNUNET_TIME_RelativeNBO a) Convert relative time from network
byte order.
@item struct GNUNET_TIME_AbsoluteNBO GNUNET_TIME_absolute_hton
(struct GNUNET_TIME_Absolute a) Convert relative time to network byte
order.
@item struct GNUNET_TIME_Absolute GNUNET_TIME_absolute_ntoh
(struct GNUNET_TIME_AbsoluteNBO a) Convert relative time from network
byte order.
@end table

@cindex Cryptography API
@node Cryptography API
@subsection Cryptography API
@c %**end of header

The gnunetutil APIs provides the cryptographic primitives used in GNUnet.
GNUnet uses 2048 bit RSA keys for the session key exchange and for signing
messages by peers and most other public-key operations. Most researchers
in cryptography consider 2048 bit RSA keys as secure and practically
unbreakable for a long time. The API provides functions to create a fresh
key pair, read a private key from a file (or create a new file if the
file does not exist), encrypt, decrypt, sign, verify and extraction of
the public key into a format suitable for network transmission.

For the encryption of files and the actual data exchanged between peers
GNUnet uses 256-bit AES encryption. Fresh, session keys are negotiated
for every new connection.@ Again, there is no published technique to
break this cipher in any realistic amount of time. The API provides
functions for generation of keys, validation of keys (important for
checking that decryptions using RSA succeeded), encryption and decryption.

GNUnet uses SHA-512 for computing one-way hash codes. The API provides
functions to compute a hash over a block in memory or over a file on disk.

The crypto API also provides functions for randomizing a block of memory,
obtaining a single random number and for generating a permuation of the
numbers 0 to n-1. Random number generation distinguishes between WEAK and
STRONG random number quality; WEAK random numbers are pseudo-random
whereas STRONG random numbers use entropy gathered from the operating
system.

Finally, the crypto API provides a means to deterministically generate a
1024-bit RSA key from a hash code. These functions should most likely not
be used by most applications; most importantly,
GNUNET_CRYPTO_rsa_key_create_from_hash does not create an RSA-key that
should be considered secure for traditional applications of RSA.

@cindex Message Queue API
@node Message Queue API
@subsection Message Queue API
@c %**end of header

@strong{ Introduction }@
Often, applications need to queue messages that
are to be sent to other GNUnet peers, clients or services. As all of
GNUnet's message-based communication APIs, by design, do not allow
messages to be queued, it is common to implement custom message queues
manually when they are needed. However, writing very similar code in
multiple places is tedious and leads to code duplication.

MQ (for Message Queue) is an API that provides the functionality to
implement and use message queues. We intend to eventually replace all of
the custom message queue implementations in GNUnet with MQ.

@strong{ Basic Concepts }@
The two most important entities in MQ are queues and envelopes.

Every queue is backed by a specific implementation (e.g. for mesh, stream,
connection, server client, etc.) that will actually deliver the queued
messages. For convenience,@ some queues also allow to specify a list of
message handlers. The message queue will then also wait for incoming
messages and dispatch them appropriately.

An envelope holds the the memory for a message, as well as metadata
(Where is the envelope queued? What should happen after it has been
sent?). Any envelope can only be queued in one message queue.

@strong{ Creating Queues }@
The following is a list of currently available message queues. Note that
to avoid layering issues, message queues for higher level APIs are not
part of @code{libgnunetutil}, but@ the respective API itself provides the
queue implementation.

@table @asis

@item @code{GNUNET_MQ_queue_for_connection_client}
Transmits queued messages over a @code{GNUNET_CLIENT_Connection} handle.
Also supports receiving with message handlers.

@item @code{GNUNET_MQ_queue_for_server_client}
Transmits queued messages over a @code{GNUNET_SERVER_Client} handle. Does
not support incoming message handlers.

@item @code{GNUNET_MESH_mq_create} Transmits queued messages over a
@code{GNUNET_MESH_Tunnel} handle. Does not support incoming message
handlers.

@item @code{GNUNET_MQ_queue_for_callbacks} This is the most general
implementation. Instead of delivering and receiving messages with one of
GNUnet's communication APIs, implementation callbacks are called. Refer to
"Implementing Queues" for a more detailed explanation.
@end table


@strong{ Allocating Envelopes }@
A GNUnet message (as defined by the GNUNET_MessageHeader) has three
parts: The size, the type, and the body.

MQ provides macros to allocate an envelope containing a message
conveniently, automatically setting the size and type fields of the
message.

Consider the following simple message, with the body consisting of a
single number value.
@c why the empy code function?
@code{}

@example
struct NumberMessage @{
  /** Type: GNUNET_MESSAGE_TYPE_EXAMPLE_1 */
  struct GNUNET_MessageHeader header;
  uint32_t number GNUNET_PACKED;
@};
@end example

An envelope containing an instance of the NumberMessage can be
constructed like this:

@example
struct GNUNET_MQ_Envelope *ev;
struct NumberMessage *msg;
ev = GNUNET_MQ_msg (msg, GNUNET_MESSAGE_TYPE_EXAMPLE_1);
msg->number = htonl (42);
@end example

In the above code, @code{GNUNET_MQ_msg} is a macro. The return value is
the newly allocated envelope. The first argument must be a pointer to some
@code{struct} containing a @code{struct GNUNET_MessageHeader header}
field, while the second argument is the desired message type, in host
byte order.

The @code{msg} pointer now points to an allocated message, where the
message type and the message size are already set. The message's size is
inferred from the type of the @code{msg} pointer: It will be set to
'sizeof(*msg)', properly converted to network byte order.

If the message body's size is dynamic, the the macro
@code{GNUNET_MQ_msg_extra} can be used to allocate an envelope whose
message has additional space allocated after the @code{msg} structure.

If no structure has been defined for the message,
@code{GNUNET_MQ_msg_header_extra} can be used to allocate additional space
after the message header. The first argument then must be a pointer to a
@code{GNUNET_MessageHeader}.

@strong{Envelope Properties}@
A few functions in MQ allow to set additional properties on envelopes:

@table @asis

@item @code{GNUNET_MQ_notify_sent} Allows to specify a function that will
be called once the envelope's message has been sent irrevocably.
An envelope can be canceled precisely up to the@ point where the notify
sent callback has been called.

@item @code{GNUNET_MQ_disable_corking} No corking will be used when
sending the message. Not every@ queue supports this flag, per default,
envelopes are sent with corking.@

@end table


@strong{Sending Envelopes}@
Once an envelope has been constructed, it can be queued for sending with
@code{GNUNET_MQ_send}.

Note that in order to avoid memory leaks, an envelope must either be sent
(the queue will free it) or destroyed explicitly with
@code{GNUNET_MQ_discard}.

@strong{Canceling Envelopes}@
An envelope queued with @code{GNUNET_MQ_send} can be canceled with
@code{GNUNET_MQ_cancel}. Note that after the notify sent callback has
been called, canceling a message results in undefined behavior.
Thus it is unsafe to cancel an envelope that does not have a notify sent
callback. When canceling an envelope, it is not necessary@ to call
@code{GNUNET_MQ_discard}, and the envelope can't be sent again.

@strong{ Implementing Queues }@
@code{TODO}

@cindex Service API
@node Service API
@subsection Service API
@c %**end of header

Most GNUnet code lives in the form of services. Services are processes
that offer an API for other components of the system to build on. Those
other components can be command-line tools for users, graphical user
interfaces or other services. Services provide their API using an IPC
protocol. For this, each service must listen on either a TCP port or a
UNIX domain socket; for this, the service implementation uses the server
API. This use of server is exposed directly to the users of the service
API. Thus, when using the service API, one is usually also often using
large parts of the server API. The service API provides various
convenience functions, such as parsing command-line arguments and the
configuration file, which are not found in the server API.
The dual to the service/server API is the client API, which can be used to
access services.

The most common way to start a service is to use the
@code{GNUNET_SERVICE_run} function from the program's main function.
@code{GNUNET_SERVICE_run} will then parse the command line and
configuration files and, based on the options found there,
start the server. It will then give back control to the main
program, passing the server and the configuration to the
@code{GNUNET_SERVICE_Main} callback. @code{GNUNET_SERVICE_run}
will also take care of starting the scheduler loop.
If this is inappropriate (for example, because the scheduler loop
is already running), @code{GNUNET_SERVICE_start} and
related functions provide an alternative to @code{GNUNET_SERVICE_run}.

When starting a service, the service_name option is used to determine
which sections in the configuration file should be used to configure the
service. A typical value here is the name of the @file{src/}
sub-directory, for example @file{statistics}.
The same string would also be given to
@code{GNUNET_CLIENT_connect} to access the service.

Once a service has been initialized, the program should use the
@code{GNUNET_SERVICE_Main} callback to register message handlers
using @code{GNUNET_SERVER_add_handlers}.
The service will already have registered a handler for the
"TEST" message.

@fnindex GNUNET_SERVICE_Options
The option bitfield (@code{enum GNUNET_SERVICE_Options})
determines how a service should behave during shutdown.
There are three key strategies:

@table @asis

@item instant (@code{GNUNET_SERVICE_OPTION_NONE})
Upon receiving the shutdown
signal from the scheduler, the service immediately terminates the server,
closing all existing connections with clients.
@item manual (@code{GNUNET_SERVICE_OPTION_MANUAL_SHUTDOWN})
The service does nothing by itself
during shutdown. The main program will need to take the appropriate
action by calling GNUNET_SERVER_destroy or GNUNET_SERVICE_stop (depending
on how the service was initialized) to terminate the service. This method
is used by gnunet-service-arm and rather uncommon.
@item soft (@code{GNUNET_SERVICE_OPTION_SOFT_SHUTDOWN})
Upon receiving the shutdown signal from the scheduler,
the service immediately tells the server to stop
listening for incoming clients. Requests from normal existing clients are
still processed and the server/service terminates once all normal clients
have disconnected. Clients that are not expected to ever disconnect (such
as clients that monitor performance values) can be marked as 'monitor'
clients using GNUNET_SERVER_client_mark_monitor. Those clients will
continue to be processed until all 'normal' clients have disconnected.
Then, the server will terminate, closing the monitor connections.
This mode is for example used by 'statistics', allowing existing 'normal'
clients to set (possibly persistent) statistic values before terminating.

@end table

@c ***********************************************************************
@node Optimizing Memory Consumption of GNUnet's (Multi-) Hash Maps
@subsection Optimizing Memory Consumption of GNUnet's (Multi-) Hash Maps
@c %**end of header

A commonly used data structure in GNUnet is a (multi-)hash map. It is most
often used to map a peer identity to some data structure, but also to map
arbitrary keys to values (for example to track requests in the distributed
hash table or in file-sharing). As it is commonly used, the DHT is
actually sometimes responsible for a large share of GNUnet's overall
memory consumption (for some processes, 30% is not uncommon). The
following text documents some API quirks (and their implications for
applications) that were recently introduced to minimize the footprint of
the hash map.


@c ***********************************************************************
@menu
* Analysis::
* Solution::
* Migration::
* Conclusion::
* Availability::
@end menu

@node Analysis
@subsubsection Analysis
@c %**end of header

The main reason for the "excessive" memory consumption by the hash map is
that GNUnet uses 512-bit cryptographic hash codes --- and the
(multi-)hash map also uses the same 512-bit 'struct GNUNET_HashCode'. As
a result, storing just the keys requires 64 bytes of memory for each key.
As some applications like to keep a large number of entries in the hash
map (after all, that's what maps are good for), 64 bytes per hash is
significant: keeping a pointer to the value and having a linked list for
collisions consume between 8 and 16 bytes, and 'malloc' may add about the
same overhead per allocation, putting us in the 16 to 32 byte per entry
ballpark. Adding a 64-byte key then triples the overall memory
requirement for the hash map.

To make things "worse", most of the time storing the key in the hash map
is not required: it is typically already in memory elsewhere! In most
cases, the values stored in the hash map are some application-specific
struct that _also_ contains the hash. Here is a simplified example:

@example
struct MyValue @{
struct GNUNET_HashCode key;
unsigned int my_data; @};

// ...
val = GNUNET_malloc (sizeof (struct MyValue));
val->key = key;
val->my_data = 42;
GNUNET_CONTAINER_multihashmap_put (map, &key, val, ...);
@end example

This is a common pattern as later the entries might need to be removed,
and at that time it is convenient to have the key immediately at hand:

@example
GNUNET_CONTAINER_multihashmap_remove (map, &val->key, val);
@end example


Note that here we end up with two times 64 bytes for the key, plus maybe
64 bytes total for the rest of the 'struct MyValue' and the map entry in
the hash map. The resulting redundant storage of the key increases
overall memory consumption per entry from the "optimal" 128 bytes to 192
bytes. This is not just an extreme example: overheads in practice are
actually sometimes close to those highlighted in this example. This is
especially true for maps with a significant number of entries, as there
we tend to really try to keep the entries small.

@c ***********************************************************************
@node Solution
@subsubsection Solution
@c %**end of header

The solution that has now been implemented is to @strong{optionally}
allow the hash map to not make a (deep) copy of the hash but instead have
a pointer to the hash/key in the entry. This reduces the memory
consumption for the key from 64 bytes to 4 to 8 bytes. However, it can
also only work if the key is actually stored in the entry (which is the
case most of the time) and if the entry does not modify the key (which in
all of the code I'm aware of has been always the case if there key is
stored in the entry). Finally, when the client stores an entry in the
hash map, it @strong{must} provide a pointer to the key within the entry,
not just a pointer to a transient location of the key. If
the client code does not meet these requirements, the result is a dangling
pointer and undefined behavior of the (multi-)hash map API.

@c ***********************************************************************
@node Migration
@subsubsection Migration
@c %**end of header

To use the new feature, first check that the values contain the respective
key (and never modify it). Then, all calls to
@code{GNUNET_CONTAINER_multihashmap_put} on the respective map must be
audited and most likely changed to pass a pointer into the value's struct.
For the initial example, the new code would look like this:

@example
struct MyValue @{
struct GNUNET_HashCode key;
unsigned int my_data; @};

// ...
val = GNUNET_malloc (sizeof (struct MyValue));
val->key = key; val->my_data = 42;
GNUNET_CONTAINER_multihashmap_put (map, &val->key, val, ...);
@end example


Note that @code{&val} was changed to @code{&val->key} in the argument to
the @code{put} call. This is critical as often @code{key} is on the stack
or in some other transient data structure and thus having the hash map
keep a pointer to @code{key} would not work. Only the key inside of
@code{val} has the same lifetime as the entry in the map (this must of
course be checked as well). Naturally, @code{val->key} must be
intiialized before the @code{put} call. Once all @code{put} calls have
been converted and double-checked, you can change the call to create the
hash map from

@example
map =
GNUNET_CONTAINER_multihashmap_create (SIZE, GNUNET_NO);
@end example

to

@example
map = GNUNET_CONTAINER_multihashmap_create (SIZE, GNUNET_YES);
@end example

If everything was done correctly, you now use about 60 bytes less memory
per entry in @code{map}. However, if now (or in the future) any call to
@code{put} does not ensure that the given key is valid until the entry is
removed from the map, undefined behavior is likely to be observed.

@c ***********************************************************************
@node Conclusion
@subsubsection Conclusion
@c %**end of header

The new optimization can is often applicable and can result in a
reduction in memory consumption of up to 30% in practice. However, it
makes the code less robust as additional invariants are imposed on the
multi hash map client. Thus applications should refrain from enabling the
new mode unless the resulting performance increase is deemed significant
enough. In particular, it should generally not be used in new code (wait
at least until benchmarks exist).

@c ***********************************************************************
@node Availability
@subsubsection Availability
@c %**end of header

The new multi hash map code was committed in SVN 24319 (will be in GNUnet
0.9.4). Various subsystems (transport, core, dht, file-sharing) were
previously audited and modified to take advantage of the new capability.
In particular, memory consumption of the file-sharing service is expected
to drop by 20-30% due to this change.


@cindex CONTAINER_MDLL API
@node CONTAINER_MDLL API
@subsection CONTAINER_MDLL API
@c %**end of header

This text documents the GNUNET_CONTAINER_MDLL API. The
GNUNET_CONTAINER_MDLL API is similar to the GNUNET_CONTAINER_DLL API in
that it provides operations for the construction and manipulation of
doubly-linked lists. The key difference to the (simpler) DLL-API is that
the MDLL-version allows a single element (instance of a "struct") to be
in multiple linked lists at the same time.

Like the DLL API, the MDLL API stores (most of) the data structures for
the doubly-linked list with the respective elements; only the 'head' and
'tail' pointers are stored "elsewhere" --- and the application needs to
provide the locations of head and tail to each of the calls in the
MDLL API. The key difference for the MDLL API is that the "next" and
"previous" pointers in the struct can no longer be simply called "next"
and "prev" --- after all, the element may be in multiple doubly-linked
lists, so we cannot just have one "next" and one "prev" pointer!

The solution is to have multiple fields that must have a name of the
format "next_XX" and "prev_XX" where "XX" is the name of one of the
doubly-linked lists. Here is a simple example:

@example
struct MyMultiListElement @{
  struct MyMultiListElement *next_ALIST;
  struct MyMultiListElement *prev_ALIST;
  struct MyMultiListElement *next_BLIST;
  struct MyMultiListElement *prev_BLIST;
  void
  *data;
@};
@end example


Note that by convention, we use all-uppercase letters for the list names.
In addition, the program needs to have a location for the head and tail
pointers for both lists, for example:

@example
static struct MyMultiListElement *head_ALIST;
static struct MyMultiListElement *tail_ALIST;
static struct MyMultiListElement *head_BLIST;
static struct MyMultiListElement *tail_BLIST;
@end example


Using the MDLL-macros, we can now insert an element into the ALIST:

@example
GNUNET_CONTAINER_MDLL_insert (ALIST, head_ALIST, tail_ALIST, element);
@end example


Passing "ALIST" as the first argument to MDLL specifies which of the
next/prev fields in the 'struct MyMultiListElement' should be used. The
extra "ALIST" argument and the "_ALIST" in the names of the
next/prev-members are the only differences between the MDDL and DLL-API.
Like the DLL-API, the MDLL-API offers functions for inserting (at head,
at tail, after a given element) and removing elements from the list.
Iterating over the list should be done by directly accessing the
"next_XX" and/or "prev_XX" members.

@cindex Automatic Restart Manager
@cindex ARM
@node Automatic Restart Manager (ARM)
@section Automatic Restart Manager (ARM)
@c %**end of header

GNUnet's Automated Restart Manager (ARM) is the GNUnet service responsible
for system initialization and service babysitting. ARM starts and halts
services, detects configuration changes and restarts services impacted by
the changes as needed. It's also responsible for restarting services in
case of crashes and is planned to incorporate automatic debugging for
diagnosing service crashes providing developers insights about crash
reasons. The purpose of this document is to give GNUnet developer an idea
about how ARM works and how to interact with it.

@menu
* Basic functionality::
* Key configuration options::
* ARM - Availability::
* Reliability::
@end menu

@c ***********************************************************************
@node Basic functionality
@subsection Basic functionality
@c %**end of header

@itemize @bullet
@item ARM source code can be found under "src/arm".@ Service processes are
managed by the functions in "gnunet-service-arm.c" which is controlled
with "gnunet-arm.c" (main function in that file is ARM's entry point).

@item The functions responsible for communicating with ARM , starting and
stopping services -including ARM service itself- are provided by the
ARM API "arm_api.c".@ Function: GNUNET_ARM_connect() returns to the caller
an ARM handle after setting it to the caller's context (configuration and
scheduler in use). This handle can be used afterwards by the caller to
communicate with ARM. Functions GNUNET_ARM_start_service() and
GNUNET_ARM_stop_service() are used for starting and stopping services
respectively.

@item A typical example of using these basic ARM services can be found in
file test_arm_api.c. The test case connects to ARM, starts it, then uses
it to start a service "resolver", stops the "resolver" then stops "ARM".
@end itemize

@c ***********************************************************************
@node Key configuration options
@subsection Key configuration options
@c %**end of header

Configurations for ARM and services should be available in a .conf file
(As an example, see test_arm_api_data.conf). When running ARM, the
configuration file to use should be passed to the command:

@example
$ gnunet-arm -s -c configuration_to_use.conf
@end example

If no configuration is passed, the default configuration file will be used
(see GNUNET_PREFIX/share/gnunet/defaults.conf which is created from
contrib/defaults.conf).@ Each of the services is having a section starting
by the service name between square brackets, for example: "[arm]".
The following options configure how ARM configures or interacts with the
various services:

@table @asis

@item PORT Port number on which the service is listening for incoming TCP
connections. ARM will start the services should it notice a request at
this port.

@item HOSTNAME Specifies on which host the service is deployed. Note
that ARM can only start services that are running on the local system
(but will not check that the hostname matches the local machine name).
This option is used by the @code{gnunet_client_lib.h} implementation to
determine which system to connect to. The default is "localhost".

@item BINARY The name of the service binary file.

@item OPTIONS To be passed to the service.

@item PREFIX A command to pre-pend to the actual command, for example,
running a service with "valgrind" or "gdb"

@item DEBUG Run in debug mode (much verbosity).

@item AUTOSTART ARM will listen to UNIX domain socket and/or TCP port of
the service and start the service on-demand.

@item FORCESTART ARM will always start this service when the peer
is started.

@item ACCEPT_FROM IPv4 addresses the service accepts connections from.

@item ACCEPT_FROM6 IPv6 addresses the service accepts connections from.

@end table


Options that impact the operation of ARM overall are in the "[arm]"
section. ARM is a normal service and has (except for AUTOSTART) all of the
options that other services do. In addition, ARM has the
following options:

@table @asis

@item GLOBAL_PREFIX Command to be pre-pended to all services that are
going to run.

@item GLOBAL_POSTFIX Global option that will be supplied to all the
services that are going to run.

@end table

@c ***********************************************************************
@node ARM - Availability
@subsection ARM - Availability
@c %**end of header

As mentioned before, one of the features provided by ARM is starting
services on demand. Consider the example of one service "client" that
wants to connect to another service a "server". The "client" will ask ARM
to run the "server". ARM starts the "server". The "server" starts
listening to incoming connections. The "client" will establish a
connection with the "server". And then, they will start to communicate
together.@ One problem with that scheme is that it's slow!@
The "client" service wants to communicate with the "server" service at
once and is not willing wait for it to be started and listening to
incoming connections before serving its request.@ One solution for that
problem will be that ARM starts all services as default services. That
solution will solve the problem, yet, it's not quite practical, for some
services that are going to be started can never be used or are going to
be used after a relatively long time.@
The approach followed by ARM to solve this problem is as follows:

@itemize @bullet

@item For each service having a PORT field in the configuration file and
that is not one of the default services ( a service that accepts incoming
connections from clients), ARM creates listening sockets for all addresses
associated with that service.

@item The "client" will immediately establish a connection with
the "server".

@item ARM --- pretending to be the "server" --- will listen on the
respective port and notice the incoming connection from the "client"
(but not accept it), instead

@item Once there is an incoming connection, ARM will start the "server",
passing on the listen sockets (now, the service is started and can do its
work).

@item Other client services now can directly connect directly to the
"server".

@end itemize

@c ***********************************************************************
@node Reliability
@subsection Reliability

One of the features provided by ARM, is the automatic restart of crashed
services.@ ARM needs to know which of the running services died. Function
"gnunet-service-arm.c/maint_child_death()" is responsible for that. The
function is scheduled to run upon receiving a SIGCHLD signal. The
function, then, iterates ARM's list of services running and monitors
which service has died (crashed). For all crashing services, ARM restarts
them.@
Now, considering the case of a service having a serious problem causing it
to crash each time it's started by ARM. If ARM keeps blindly restarting
such a service, we are going to have the pattern:
start-crash-restart-crash-restart-crash and so forth!! Which is of course
not practical.@
For that reason, ARM schedules the service to be restarted after waiting
for some delay that grows exponentially with each crash/restart of that
service.@ To clarify the idea, considering the following example:

@itemize @bullet

@item Service S crashed.

@item ARM receives the SIGCHLD and inspects its list of services to find
the dead one(s).

@item ARM finds S dead and schedules it for restarting after "backoff"
time which is initially set to 1ms. ARM will double the backoff time
correspondent to S (now backoff(S) = 2ms)

@item Because there is a severe problem with S, it crashed again.

@item Again ARM receives the SIGCHLD and detects that it's S again that's
crashed. ARM schedules it for restarting but after its new backoff time
(which became 2ms), and doubles its backoff time (now backoff(S) = 4).

@item and so on, until backoff(S) reaches a certain threshold
(@code{EXPONENTIAL_BACKOFF_THRESHOLD} is set to half an hour),
after reaching it, backoff(S) will remain half an hour,
hence ARM won't be busy for a lot of time trying to restart a
problematic service.
@end itemize

@cindex TRANSPORT Subsystem
@node TRANSPORT Subsystem
@section TRANSPORT Subsystem
@c %**end of header

This chapter documents how the GNUnet transport subsystem works. The
GNUnet transport subsystem consists of three main components: the
transport API (the interface used by the rest of the system to access the
transport service), the transport service itself (most of the interesting
functions, such as choosing transports, happens here) and the transport
plugins. A transport plugin is a concrete implementation for how two
GNUnet peers communicate; many plugins exist, for example for
communication via TCP, UDP, HTTP, HTTPS and others. Finally, the
transport subsystem uses supporting code, especially the NAT/UPnP
library to help with tasks such as NAT traversal.

Key tasks of the transport service include:

@itemize @bullet

@item Create our HELLO message, notify clients and neighbours if our HELLO
changes (using NAT library as necessary)

@item Validate HELLOs from other peers (send PING), allow other peers to
validate our HELLO's addresses (send PONG)

@item Upon request, establish connections to other peers (using address
selection from ATS subsystem) and maintain them (again using PINGs and
PONGs) as long as desired

@item Accept incoming connections, give ATS service the opportunity to
switch communication channels

@item Notify clients about peers that have connected to us or that have
been disconnected from us

@item If a (stateful) connection goes down unexpectedly (without explicit
DISCONNECT), quickly attempt to recover (without notifying clients) but do
notify clients quickly if reconnecting fails

@item Send (payload) messages arriving from clients to other peers via
transport plugins and receive messages from other peers, forwarding
those to clients

@item Enforce inbound traffic limits (using flow-control if it is
applicable); outbound traffic limits are enforced by CORE, not by us (!)

@item Enforce restrictions on P2P connection as specified by the blacklist
configuration and blacklisting clients
@end itemize

Note that the term "clients" in the list above really refers to the
GNUnet-CORE service, as CORE is typically the only client of the
transport service.

@menu
* Address validation protocol::
@end menu

@node Address validation protocol
@subsection Address validation protocol
@c %**end of header

This section documents how the GNUnet transport service validates
connections with other peers. It is a high-level description of the
protocol necessary to understand the details of the implementation. It
should be noted that when we talk about PING and PONG messages in this
section, we refer to transport-level PING and PONG messages, which are
different from core-level PING and PONG messages (both in implementation
and function).

The goal of transport-level address validation is to minimize the chances
of a successful man-in-the-middle attack against GNUnet peers on the
transport level. Such an attack would not allow the adversary to decrypt
the P2P transmissions, but a successful attacker could at least measure
traffic volumes and latencies (raising the adversaries capablities by
those of a global passive adversary in the worst case). The scenarios we
are concerned about is an attacker, Mallory, giving a @code{HELLO} to
Alice that claims to be for Bob, but contains Mallory's IP address
instead of Bobs (for some transport).
Mallory would then forward the traffic to Bob (by initiating a
connection to Bob and claiming to be Alice). As a further
complication, the scheme has to work even if say Alice is behind a NAT
without traversal support and hence has no address of her own (and thus
Alice must always initiate the connection to Bob).

An additional constraint is that @code{HELLO} messages do not contain a
cryptographic signature since other peers must be able to edit
(i.e. remove) addresses from the @code{HELLO} at any time (this was
not true in GNUnet 0.8.x). A basic @strong{assumption} is that each peer
knows the set of possible network addresses that it @strong{might}
be reachable under (so for example, the external IP address of the
NAT plus the LAN address(es) with the respective ports).

The solution is the following. If Alice wants to validate that a given
address for Bob is valid (i.e. is actually established @strong{directly}
with the intended target), she sends a PING message over that connection
to Bob. Note that in this case, Alice initiated the connection so only
Alice knows which address was used for sure (Alice may be behind NAT, so
whatever address Bob sees may not be an address Alice knows she has).
Bob checks that the address given in the @code{PING} is actually one
of Bob's addresses (ie: does not belong to Mallory), and if it is,
sends back a @code{PONG} (with a signature that says that Bob
owns/uses the address from the @code{PING}).
Alice checks the signature and is happy if it is valid and the address
in the @code{PONG} is the address Alice used.
This is similar to the 0.8.x protocol where the @code{HELLO} contained a
signature from Bob for each address used by Bob.
Here, the purpose code for the signature is
@code{GNUNET_SIGNATURE_PURPOSE_TRANSPORT_PONG_OWN}. After this, Alice will
remember Bob's address and consider the address valid for a while (12h in
the current implementation). Note that after this exchange, Alice only
considers Bob's address to be valid, the connection itself is not
considered 'established'. In particular, Alice may have many addresses
for Bob that Alice considers valid.

@c TODO: reference Footnotes so that I don't have to duplicate the
@c footnotes or add them to an index at the end. Is this possible at
@c all in Texinfo?
The @code{PONG} message is protected with a nonce/challenge against replay
attacks@footnote{@uref{http://en.wikipedia.org/wiki/Replay_attack, replay}}
and uses an expiration time for the signature (but those are almost
implementation details).

@cindex NAT library
@node NAT library
@section NAT library
@c %**end of header

The goal of the GNUnet NAT library is to provide a general-purpose API for
NAT traversal @strong{without} third-party support. So protocols that
involve contacting a third peer to help establish a connection between
two peers are outside of the scope of this API. That does not mean that
GNUnet doesn't support involving a third peer (we can do this with the
distance-vector transport or using application-level protocols), it just
means that the NAT API is not concerned with this possibility. The API is
written so that it will work for IPv6-NAT in the future as well as
current IPv4-NAT. Furthermore, the NAT API is always used, even for peers
that are not behind NAT --- in that case, the mapping provided is simply
the identity.

NAT traversal is initiated by calling @code{GNUNET_NAT_register}. Given a
set of addresses that the peer has locally bound to (TCP or UDP), the NAT
library will return (via callback) a (possibly longer) list of addresses
the peer @strong{might} be reachable under. Internally, depending on the
configuration, the NAT library will try to punch a hole (using UPnP) or
just "know" that the NAT was manually punched and generate the respective
external IP address (the one that should be globally visible) based on
the given information.

The NAT library also supports ICMP-based NAT traversal. Here, the other
peer can request connection-reversal by this peer (in this special case,
the peer is even allowed to configure a port number of zero). If the NAT
library detects a connection-reversal request, it returns the respective
target address to the client as well. It should be noted that
connection-reversal is currently only intended for TCP, so other plugins
@strong{must} pass @code{NULL} for the reversal callback. Naturally, the
NAT library also supports requesting connection reversal from a remote
peer (@code{GNUNET_NAT_run_client}).

Once initialized, the NAT handle can be used to test if a given address is
possibly a valid address for this peer (@code{GNUNET_NAT_test_address}).
This is used for validating our addresses when generating PONGs.

Finally, the NAT library contains an API to test if our NAT configuration
is correct. Using @code{GNUNET_NAT_test_start} @strong{before} binding to
the respective port, the NAT library can be used to test if the
configuration works. The test function act as a local client, initialize
the NAT traversal and then contact a @code{gnunet-nat-server} (running by
default on @code{gnunet.org}) and ask for a connection to be established.
This way, it is easy to test if the current NAT configuration is valid.

@node Distance-Vector plugin
@section Distance-Vector plugin
@c %**end of header

The Distance Vector (DV) transport is a transport mechanism that allows
peers to act as relays for each other, thereby connecting peers that would
otherwise be unable to connect. This gives a larger connection set to
applications that may work better with more peers to choose from (for
example, File Sharing and/or DHT).

The Distance Vector transport essentially has two functions. The first is
"gossiping" connection information about more distant peers to directly
connected peers. The second is taking messages intended for non-directly
connected peers and encapsulating them in a DV wrapper that contains the
required information for routing the message through forwarding peers. Via
gossiping, optimal routes through the known DV neighborhood are discovered
and utilized and the message encapsulation provides some benefits in
addition to simply getting the message from the correct source to the
proper destination.

The gossiping function of DV provides an up to date routing table of
peers that are available up to some number of hops. We call this a
fisheye view of the network (like a fish, nearby objects are known while
more distant ones unknown). Gossip messages are sent only to directly
connected peers, but they are sent about other knowns peers within the
"fisheye distance". Whenever two peers connect, they immediately gossip
to each other about their appropriate other neighbors. They also gossip
about the newly connected peer to previously
connected neighbors. In order to keep the routing tables up to date,
disconnect notifications are propogated as gossip as well (because
disconnects may not be sent/received, timeouts are also used remove
stagnant routing table entries).

Routing of messages via DV is straightforward. When the DV transport is
notified of a message destined for a non-direct neighbor, the appropriate
forwarding peer is selected, and the base message is encapsulated in a DV
message which contains information about the initial peer and the intended
recipient. At each forwarding hop, the initial peer is validated (the
forwarding peer ensures that it has the initial peer in its neighborhood,
otherwise the message is dropped). Next the base message is
re-encapsulated in a new DV message for the next hop in the forwarding
chain (or delivered to the current peer, if it has arrived at the
destination).

Assume a three peer network with peers Alice, Bob and Carol. Assume that

@example
Alice <-> Bob and Bob <-> Carol
@end example

@noindent
are direct (e.g. over TCP or UDP transports) connections, but that
Alice cannot directly connect to Carol.
This may be the case due to NAT or firewall restrictions, or perhaps
based on one of the peers respective configurations. If the Distance
Vector transport is enabled on all three peers, it will automatically
discover (from the gossip protocol) that Alice and Carol can connect via
Bob and provide a "virtual" Alice <-> Carol connection. Routing between
Alice and Carol happens as follows; Alice creates a message destined for
Carol and notifies the DV transport about it. The DV transport at Alice
looks up Carol in the routing table and finds that the message must be
sent through Bob for Carol. The message is encapsulated setting Alice as
the initiator and Carol as the destination and sent to Bob. Bob receives
the messages, verifies that both Alice and Carol are known to Bob, and
re-wraps the message in a new DV message for Carol.
The DV transport at Carol receives this message, unwraps the original
message, and delivers it to Carol as though it came directly from Alice.

@cindex SMTP plugin
@node SMTP plugin
@section SMTP plugin
@c %**end of header

This section describes the new SMTP transport plugin for GNUnet as it
exists in the 0.7.x and 0.8.x branch. SMTP support is currently not
available in GNUnet 0.9.x. This page also describes the transport layer
abstraction (as it existed in 0.7.x and 0.8.x) in more detail and gives
some benchmarking results. The performance results presented are quite
old and maybe outdated at this point.

@itemize @bullet
@item Why use SMTP for a peer-to-peer transport?
@item SMTPHow does it work?
@item How do I configure my peer?
@item How do I test if it works?
@item How fast is it?
@item Is there any additional documentation?
@end itemize


@menu
* Why use SMTP for a peer-to-peer transport?::
* How does it work?::
* How do I configure my peer?::
* How do I test if it works?::
* How fast is it?::
@end menu

@node Why use SMTP for a peer-to-peer transport?
@subsection Why use SMTP for a peer-to-peer transport?
@c %**end of header

There are many reasons why one would not want to use SMTP:

@itemize @bullet
@item SMTP is using more bandwidth than TCP, UDP or HTTP
@item SMTP has a much higher latency.
@item SMTP requires significantly more computation (encoding and decoding
time) for the peers.
@item SMTP is significantly more complicated to configure.
@item SMTP may be abused by tricking GNUnet into sending mail to@
non-participating third parties.
@end itemize

So why would anybody want to use SMTP?
@itemize @bullet
@item SMTP can be used to contact peers behind NAT boxes (in virtual
private networks).
@item SMTP can be used to circumvent policies that limit or prohibit
peer-to-peer traffic by masking as "legitimate" traffic.
@item SMTP uses E-mail addresses which are independent of a specific IP,
which can be useful to address peers that use dynamic IP addresses.
@item SMTP can be used to initiate a connection (e.g. initial address
exchange) and peers can then negotiate the use of a more efficient
protocol (e.g. TCP) for the actual communication.
@end itemize

In summary, SMTP can for example be used to send a message to a peer
behind a NAT box that has a dynamic IP to tell the peer to establish a
TCP connection to a peer outside of the private network. Even an
extraordinary overhead for this first message would be irrelevant in this
type of situation.

@node How does it work?
@subsection How does it work?
@c %**end of header

When a GNUnet peer needs to send a message to another GNUnet peer that has
advertised (only) an SMTP transport address, GNUnet base64-encodes the
message and sends it in an E-mail to the advertised address. The
advertisement contains a filter which is placed in the E-mail header,
such that the receiving host can filter the tagged E-mails and forward it
to the GNUnet peer process. The filter can be specified individually by
each peer and be changed over time. This makes it impossible to censor
GNUnet E-mail messages by searching for a generic filter.

@node How do I configure my peer?
@subsection How do I configure my peer?
@c %**end of header

First, you need to configure @code{procmail} to filter your inbound E-mail
for GNUnet traffic. The GNUnet messages must be delivered into a pipe, for
example @code{/tmp/gnunet.smtp}. You also need to define a filter that is
used by @command{procmail} to detect GNUnet messages. You are free to
choose whichever filter you like, but you should make sure that it does
not occur in your other E-mail. In our example, we will use
@code{X-mailer: GNUnet}. The @code{~/.procmailrc} configuration file then
looks like this:

@example
:0:
* ^X-mailer: GNUnet
/tmp/gnunet.smtp
# where do you want your other e-mail delivered to
# (default: /var/spool/mail/)
:0: /var/spool/mail/
@end example

After adding this file, first make sure that your regular E-mail still
works (e.g. by sending an E-mail to yourself). Then edit the GNUnet
configuration. In the section @code{SMTP} you need to specify your E-mail
address under @code{EMAIL}, your mail server (for outgoing mail) under
@code{SERVER}, the filter (X-mailer: GNUnet in the example) under
@code{FILTER} and the name of the pipe under @code{PIPE}.@ The completed
section could then look like this:

@example
EMAIL = me@@mail.gnu.org MTU = 65000 SERVER = mail.gnu.org:25 FILTER =
"X-mailer: GNUnet" PIPE = /tmp/gnunet.smtp
@end example

Finally, you need to add @code{smtp} to the list of @code{TRANSPORTS} in
the @code{GNUNETD} section. GNUnet peers will use the E-mail address that
you specified to contact your peer until the advertisement times out.
Thus, if you are not sure if everything works properly or if you are not
planning to be online for a long time, you may want to configure this
timeout to be short, e.g. just one hour. For this, set
@code{HELLOEXPIRES} to @code{1} in the @code{GNUNETD} section.

This should be it, but you may probably want to test it first.

@node How do I test if it works?
@subsection How do I test if it works?
@c %**end of header

Any transport can be subjected to some rudimentary tests using the
@code{gnunet-transport-check} tool. The tool sends a message to the local
node via the transport and checks that a valid message is received. While
this test does not involve other peers and can not check if firewalls or
other network obstacles prohibit proper operation, this is a great
testcase for the SMTP transport since it tests pretty much nearly all of
the functionality.

@code{gnunet-transport-check} should only be used without running
@code{gnunetd} at the same time. By default, @code{gnunet-transport-check}
tests all transports that are specified in the configuration file. But
you can specifically test SMTP by giving the option
@code{--transport=smtp}.

Note that this test always checks if a transport can receive and send.
While you can configure most transports to only receive or only send
messages, this test will only work if you have configured the transport
to send and receive messages.

@node How fast is it?
@subsection How fast is it?
@c %**end of header

We have measured the performance of the UDP, TCP and SMTP transport layer
directly and when used from an application using the GNUnet core.
Measureing just the transport layer gives the better view of the actual
overhead of the protocol, whereas evaluating the transport from the
application puts the overhead into perspective from a practical point of
view.

The loopback measurements of the SMTP transport were performed on three
different machines spanning a range of modern SMTP configurations. We
used a PIII-800 running RedHat 7.3 with the Purdue Computer Science
configuration which includes filters for spam. We also used a Xenon 2 GHZ
with a vanilla RedHat 8.0 sendmail configuration. Furthermore, we used
qmail on a PIII-1000 running Sorcerer GNU Linux (SGL). The numbers for
UDP and TCP are provided using the SGL configuration. The qmail benchmark
uses qmail's internal filtering whereas the sendmail benchmarks relies on
procmail to filter and deliver the mail. We used the transport layer to
send a message of b bytes (excluding transport protocol headers) directly
to the local machine. This way, network latency and packet loss on the
wire have no impact on the timings. n messages were sent sequentially over
the transport layer, sending message i+1 after the i-th message was
received. All messages were sent over the same connection and the time to
establish the connection was not taken into account since this overhead is
miniscule in practice --- as long as a connection is used for a
significant number of messages.

@multitable @columnfractions .20 .15 .15 .15 .15 .15
@headitem Transport @tab UDP @tab TCP @tab SMTP (Purdue sendmail)
@tab SMTP (RH 8.0) @tab SMTP (SGL qmail)
@item  11 bytes @tab 31 ms @tab 55 ms @tab  781 s @tab 77 s @tab 24 s
@item  407 bytes @tab 37 ms @tab 62 ms @tab  789 s @tab 78 s @tab 25 s
@item 1,221 bytes @tab 46 ms @tab 73 ms @tab  804 s @tab 78 s @tab 25 s
@end multitable

The benchmarks show that UDP and TCP are, as expected, both significantly
faster compared with any of the SMTP services. Among the SMTP
implementations, there can be significant differences depending on the
SMTP configuration. Filtering with an external tool like procmail that
needs to re-parse its configuration for each mail can be very expensive.
Applying spam filters can also significantly impact the performance of
the underlying SMTP implementation. The microbenchmark shows that SMTP
can be a viable solution for initiating peer-to-peer sessions: a couple of
seconds to connect to a peer are probably not even going to be noticed by
users. The next benchmark measures the possible throughput for a
transport. Throughput can be measured by sending multiple messages in
parallel and measuring packet loss. Note that not only UDP but also the
TCP transport can actually loose messages since the TCP implementation
drops messages if the @code{write} to the socket would block. While the
SMTP protocol never drops messages itself, it is often so
slow that only a fraction of the messages can be sent and received in the
given time-bounds. For this benchmark we report the message loss after
allowing t time for sending m messages. If messages were not sent (or
received) after an overall timeout of t, they were considered lost. The
benchmark was performed using two Xeon 2 GHZ machines running RedHat 8.0
with sendmail. The machines were connected with a direct 100 MBit ethernet
connection.@ Figures udp1200, tcp1200 and smtp-MTUs show that the
throughput for messages of size 1,200 octects is 2,343 kbps, 3,310 kbps
and 6 kbps for UDP, TCP and SMTP respectively. The high per-message
overhead of SMTP can be improved by increasing the MTU, for example, an
MTU of 12,000 octets improves the throughput to 13 kbps as figure
smtp-MTUs shows. Our research paper) has some more details on the
benchmarking results.

@cindex Bluetooth plugin
@node Bluetooth plugin
@section Bluetooth plugin
@c %**end of header

This page describes the new Bluetooth transport plugin for GNUnet. The
plugin is still in the testing stage so don't expect it to work
perfectly. If you have any questions or problems just post them here or
ask on the IRC channel.

@itemize @bullet
@item What do I need to use the Bluetooth plugin transport?
@item BluetoothHow does it work?
@item What possible errors should I be aware of?
@item How do I configure my peer?
@item How can I test it?
@end itemize

@menu
* What do I need to use the Bluetooth plugin transport?::
* How does it work2?::
* What possible errors should I be aware of?::
* How do I configure my peer2?::
* How can I test it?::
* The implementation of the Bluetooth transport plugin::
@end menu

@node What do I need to use the Bluetooth plugin transport?
@subsection What do I need to use the Bluetooth plugin transport?
@c %**end of header

If you are a GNU/Linux user and you want to use the Bluetooth
transport plugin you should install the
@command{BlueZ development libraries} (if they aren't already
installed).
For instructions about how to install the libraries you should
check out the BlueZ site
(@uref{http://www.bluez.org/, http://www.bluez.org}). If you don't know if
you have the necesarry libraries, don't worry, just run the GNUnet
configure script and you will be able to see a notification at the end
which will warn you if you don't have the necessary libraries.

If you are a Windows user you should have installed the
@emph{MinGW}/@emph{MSys2} with the latest updates (especially the
@emph{ws2bth} header). If this is your first build of GNUnet on Windows
you should check out the SBuild repository. It will semi-automatically
assembles a @emph{MinGW}/@emph{MSys2} installation with a lot of extra
packages which are needed for the GNUnet build. So this will ease your
work!@ Finally you just have to be sure that you have the correct drivers
for your Bluetooth device installed and that your device is on and in a
discoverable mode. The Windows Bluetooth Stack supports only the RFCOMM
protocol so we cannot turn on your device programatically!

@c FIXME: Change to unique title
@node How does it work2?
@subsection How does it work2?
@c %**end of header

The Bluetooth transport plugin uses virtually the same code as the WLAN
plugin and only the helper binary is different. The helper takes a single
argument, which represents the interface name and is specified in the
configuration file. Here are the basic steps that are followed by the
helper binary used on GNU/Linux:

@itemize @bullet
@item it verifies if the name corresponds to a Bluetooth interface name
@item it verifies if the iterface is up (if it is not, it tries to bring
it up)
@item it tries to enable the page and inquiry scan in order to make the
device discoverable and to accept incoming connection requests
@emph{The above operations require root access so you should start the
transport plugin with root privileges.}
@item it finds an available port number and registers a SDP service which
will be used to find out on which port number is the server listening on
and switch the socket in listening mode
@item it sends a HELLO message with its address
@item finally it forwards traffic from the reading sockets to the STDOUT
and from the STDIN to the writing socket
@end itemize

Once in a while the device will make an inquiry scan to discover the
nearby devices and it will send them randomly HELLO messages for peer
discovery.

@node What possible errors should I be aware of?
@subsection What possible errors should I be aware of?
@c %**end of header

@emph{This section is dedicated for GNU/Linux users}

Well there are many ways in which things could go wrong but I will try to
present some tools that you could use to debug and some scenarios.

@itemize @bullet

@item @code{bluetoothd -n -d} : use this command to enable logging in the
foreground and to print the logging messages

@item @code{hciconfig}: can be used to configure the Bluetooth devices.
If you run it without any arguments it will print information about the
state of the interfaces. So if you receive an error that the device
couldn't be brought up you should try to bring it manually and to see if
it works (use @code{hciconfig -a hciX up}). If you can't and the
Bluetooth address has the form 00:00:00:00:00:00 it means that there is
something wrong with the D-Bus daemon or with the Bluetooth daemon. Use
@code{bluetoothd} tool to see the logs

@item @code{sdptool} can be used to control and interogate SDP servers.
If you encounter problems regarding the SDP server (like the SDP server is
down) you should check out if the D-Bus daemon is running correctly and to
see if the Bluetooth daemon started correctly(use @code{bluetoothd} tool).
Also, sometimes the SDP service could work but somehow the device couldn't
register his service. Use @code{sdptool browse [dev-address]} to see if
the service is registered. There should be a service with the name of the
interface and GNUnet as provider.

@item @code{hcitool} : another useful tool which can be used to configure
the device and to send some particular commands to it.

@item @code{hcidump} : could be used for low level debugging
@end itemize

@c FIXME: A more unique name
@node How do I configure my peer2?
@subsection How do I configure my peer2?
@c %**end of header

On GNU/Linux, you just have to be sure that the interface name
corresponds to the one that you want to use.
Use the @code{hciconfig} tool to check that.
By default it is set to hci0 but you can change it.

A basic configuration looks like this:

@example
[transport-bluetooth]
# Name of the interface (typically hciX)
INTERFACE = hci0
# Real hardware, no testing
TESTMODE = 0 TESTING_IGNORE_KEYS = ACCEPT_FROM;
@end example

In order to use the Bluetooth transport plugin when the transport service
is started, you must add the plugin name to the default transport service
plugins list. For example:

@example
[transport] ...  PLUGINS = dns bluetooth ...
@end example

If you want to use only the Bluetooth plugin set
@emph{PLUGINS = bluetooth}

On Windows, you cannot specify which device to use. The only thing that
you should do is to add @emph{bluetooth} on the plugins list of the
transport service.

@node How can I test it?
@subsection How can I test it?
@c %**end of header

If you have two Bluetooth devices on the same machine and you are using
GNU/Linux you must:

@itemize @bullet

@item create two different file configuration (one which will use the
first interface (@emph{hci0}) and the other which will use the second
interface (@emph{hci1})). Let's name them @emph{peer1.conf} and
@emph{peer2.conf}.

@item run @emph{gnunet-peerinfo -c peerX.conf -s} in order to generate the
peers private keys. The @strong{X} must be replace with 1 or 2.

@item run @emph{gnunet-arm -c peerX.conf -s -i=transport} in order to
start the transport service. (Make sure that you have "bluetooth" on the
transport plugins list if the Bluetooth transport service doesn't start.)

@item run @emph{gnunet-peerinfo -c peer1.conf -s} to get the first peer's
ID. If you already know your peer ID (you saved it from the first
command), this can be skipped.

@item run @emph{gnunet-transport -c peer2.conf -p=PEER1_ID -s} to start
sending data for benchmarking to the other peer.

@end itemize


This scenario will try to connect the second peer to the first one and
then start sending data for benchmarking.

On Windows you cannot test the plugin functionality using two Bluetooth
devices from the same machine because after you install the drivers there
will occur some conflicts between the Bluetooth stacks. (At least that is
what happend on my machine : I wasn't able to use the Bluesoleil stack and
the WINDCOMM one in the same time).

If you have two different machines and your configuration files are good
you can use the same scenario presented on the begining of this section.

Another way to test the plugin functionality is to create your own
application which will use the GNUnet framework with the Bluetooth
transport service.

@node The implementation of the Bluetooth transport plugin
@subsection The implementation of the Bluetooth transport plugin
@c %**end of header

This page describes the implementation of the Bluetooth transport plugin.

First I want to remind you that the Bluetooth transport plugin uses
virtually the same code as the WLAN plugin and only the helper binary is
different. Also the scope of the helper binary from the Bluetooth
transport plugin is the same as the one used for the wlan transport
plugin: it acceses the interface and then it forwards traffic in both
directions between the Bluetooth interface and stdin/stdout of the
process involved.

The Bluetooth plugin transport could be used both on GNU/Linux and Windows
platforms.

@itemize @bullet
@item Linux functionality
@item Windows functionality
@item Pending Features
@end itemize



@menu
* Linux functionality::
* THE INITIALIZATION::
* THE LOOP::
* Details about the broadcast implementation::
* Windows functionality::
* Pending features::
@end menu

@node Linux functionality
@subsubsection Linux functionality
@c %**end of header

In order to implement the plugin functionality on GNU/Linux I
used the BlueZ stack.
For the communication with the other devices I used the RFCOMM
protocol. Also I used the HCI protocol to gain some control over the
device. The helper binary takes a single argument (the name of the
Bluetooth interface) and is separated in two stages:

@c %** 'THE INITIALIZATION' should be in bigger letters or stand out, not
@c %** starting a new section?
@node THE INITIALIZATION
@subsubsection THE INITIALIZATION

@itemize @bullet
@item first, it checks if we have root privilegies
(@emph{Remember that we need to have root privilegies in order to be able
to bring the interface up if it is down or to change its state.}).

@item second, it verifies if the interface with the given name exists.

@strong{If the interface with that name exists and it is a Bluetooth
interface:}

@item it creates a RFCOMM socket which will be used for listening and call
the @emph{open_device} method

On the @emph{open_device} method:
@itemize @bullet
@item creates a HCI socket used to send control events to the the device
@item searches for the device ID using the interface name
@item saves the device MAC address
@item checks if the interface is down and tries to bring it UP
@item checks if the interface is in discoverable mode and tries to make it
discoverable
@item closes the HCI socket and binds the RFCOMM one
@item switches the RFCOMM socket in listening mode
@item registers the SDP service (the service will be used by the other
devices to get the port on which this device is listening on)
@end itemize

@item drops the root privilegies

@strong{If the interface is not a Bluetooth interface the helper exits
with a suitable error}
@end itemize

@c %** Same as for @node entry above
@node THE LOOP
@subsubsection THE LOOP

The helper binary uses a list where it saves all the connected neighbour
devices (@emph{neighbours.devices}) and two buffers (@emph{write_pout} and
@emph{write_std}). The first message which is send is a control message
with the device's MAC address in order to announce the peer presence to
the neighbours. Here are a short description of what happens in the main
loop:

@itemize @bullet
@item Every time when it receives something from the STDIN it processes
the data and saves the message in the first buffer (@emph{write_pout}).
When it has something in the buffer, it gets the destination address from
the buffer, searches the destination address in the list (if there is no
connection with that device, it creates a new one and saves it to the
list) and sends the message.
@item Every time when it receives something on the listening socket it
accepts the connection and saves the socket on a list with the reading
sockets. @item Every time when it receives something from a reading
socket it parses the message, verifies the CRC and saves it in the
@emph{write_std} buffer in order to be sent later to the STDOUT.
@end itemize

So in the main loop we use the select function to wait until one of the
file descriptor saved in one of the two file descriptors sets used is
ready to use. The first set (@emph{rfds}) represents the reading set and
it could contain the list with the reading sockets, the STDIN file
descriptor or the listening socket. The second set (@emph{wfds}) is the
writing set and it could contain the sending socket or the STDOUT file
descriptor. After the select function returns, we check which file
descriptor is ready to use and we do what is supposed to do on that kind
of event. @emph{For example:} if it is the listening socket then we
accept a new connection and save the socket in the reading list; if it is
the STDOUT file descriptor, then we write to STDOUT the message from the
@emph{write_std} buffer.

To find out on which port a device is listening on we connect to the local
SDP server and searche the registered service for that device.

@emph{You should be aware of the fact that if the device fails to connect
to another one when trying to send a message it will attempt one more
time. If it fails again, then it skips the message.}
@emph{Also you should know that the transport Bluetooth plugin has
support for @strong{broadcast messages}.}

@node Details about the broadcast implementation
@subsubsection Details about the broadcast implementation
@c %**end of header

First I want to point out that the broadcast functionality for the CONTROL
messages is not implemented in a conventional way. Since the inquiry scan
time is too big and it will take some time to send a message to all the
discoverable devices I decided to tackle the problem in a different way.
Here is how I did it:

@itemize @bullet
@item If it is the first time when I have to broadcast a message I make an
inquiry scan and save all the devices' addresses to a vector.
@item After the inquiry scan ends I take the first address from the list
and I try to connect to it. If it fails, I try to connect to the next one.
If it succeeds, I save the socket to a list and send the message to the
device.
@item When I have to broadcast another message, first I search on the list
for a new device which I'm not connected to. If there is no new device on
the list I go to the beginning of the list and send the message to the
old devices. After 5 cycles I make a new inquiry scan to check out if
there are new discoverable devices and save them to the list. If there
are no new discoverable devices I reset the cycling counter and go again
through the old list and send messages to the devices saved in it.
@end itemize

@strong{Therefore}:

@itemize @bullet
@item every time when I have a broadcast message I look up on the list
for a new device and send the message to it
@item if I reached the end of the list for 5 times and I'm connected to
all the devices from the list I make a new inquiry scan.
@emph{The number of the list's cycles after an inquiry scan could be
increased by redefining the MAX_LOOPS variable}
@item when there are no new devices I send messages to the old ones.
@end itemize

Doing so, the broadcast control messages will reach the devices but with
delay.

@emph{NOTICE:} When I have to send a message to a certain device first I
check on the broadcast list to see if we are connected to that device. If
not we try to connect to it and in case of success we save the address and
the socket on the list. If we are already connected to that device we
simply use the socket.

@node Windows functionality
@subsubsection Windows functionality
@c %**end of header

For Windows I decided to use the Microsoft Bluetooth stack which has the
advantage of coming standard from Windows XP SP2. The main disadvantage is
that it only supports the RFCOMM protocol so we will not be able to have
a low level control over the Bluetooth device. Therefore it is the user
responsability to check if the device is up and in the discoverable mode.
Also there are no tools which could be used for debugging in order to read
the data coming from and going to a Bluetooth device, which obviously
hindered my work. Another thing that slowed down the implementation of the
plugin (besides that I wasn't too accomodated with the win32 API) was that
there were some bugs on MinGW regarding the Bluetooth. Now they are solved
but you should keep in mind that you should have the latest updates
(especially the @emph{ws2bth} header).

Besides the fact that it uses the Windows Sockets, the Windows
implemenation follows the same principles as the GNU/Linux one:

@itemize @bullet
@item It has a initalization part where it initializes the
Windows Sockets, creates a RFCOMM socket which will be binded and switched
to the listening mode and registers a SDP service. In the Microsoft
Bluetooth API there are two ways to work with the SDP:
@itemize @bullet
@item an easy way which works with very simple service records
@item a hard way which is useful when you need to update or to delete the
record
@end itemize
@end itemize

Since I only needed the SDP service to find out on which port the device
is listening on and that did not change, I decided to use the easy way.
In order to register the service I used the @emph{WSASetService} function
and I generated the @emph{Universally Unique Identifier} with the
@emph{guidgen.exe} Windows's tool.

In the loop section the only difference from the GNU/Linux implementation
is that I used the @code{GNUNET_NETWORK} library for
functions like @emph{accept}, @emph{bind}, @emph{connect} or
@emph{select}. I decided to use the
@code{GNUNET_NETWORK} library because I also needed to interact
with the STDIN and STDOUT handles and on Windows
the select function is only defined for sockets,
and it will not work for arbitrary file handles.

Another difference between GNU/Linux and Windows implementation is that in
GNU/Linux, the Bluetooth address is represented in 48 bits
while in Windows is represented in 64 bits.
Therefore I had to do some changes on @emph{plugin_transport_wlan} header.

Also, currently on Windows the Bluetooth plugin doesn't have support for
broadcast messages. When it receives a broadcast message it will skip it.

@node Pending features
@subsubsection Pending features
@c %**end of header

@itemize @bullet
@item Implement the broadcast functionality on Windows @emph{(currently
working on)}
@item Implement a testcase for the helper :@ @emph{The testcase
consists of a program which emaluates the plugin and uses the helper. It
will simulate connections, disconnections and data transfers.}
@end itemize

If you have a new idea about a feature of the plugin or suggestions about
how I could improve the implementation you are welcome to comment or to
contact me.

@node WLAN plugin
@section WLAN plugin
@c %**end of header

This section documents how the wlan transport plugin works. Parts which
are not implemented yet or could be better implemented are described at
the end.

@cindex ATS Subsystem
@node ATS Subsystem
@section ATS Subsystem
@c %**end of header

ATS stands for "automatic transport selection", and the function of ATS in
GNUnet is to decide on which address (and thus transport plugin) should
be used for two peers to communicate, and what bandwidth limits should be
imposed on such an individual connection. To help ATS make an informed
decision, higher-level services inform the ATS service about their
requirements and the quality of the service rendered. The ATS service
also interacts with the transport service to be appraised of working
addresses and to communicate its resource allocation decisions. Finally,
the ATS service's operation can be observed using a monitoring API.

The main logic of the ATS service only collects the available addresses,
their performance characteristics and the applications requirements, but
does not make the actual allocation decision. This last critical step is
left to an ATS plugin, as we have implemented (currently three) different
allocation strategies which differ significantly in their performance and
maturity, and it is still unclear if any particular plugin is generally
superior.

@cindex CORE Subsystem
@node CORE Subsystem
@section CORE Subsystem
@c %**end of header

The CORE subsystem in GNUnet is responsible for securing link-layer
communications between nodes in the GNUnet overlay network. CORE builds
on the TRANSPORT subsystem which provides for the actual, insecure,
unreliable link-layer communication (for example, via UDP or WLAN), and
then adds fundamental security to the connections:

@itemize @bullet
@item confidentiality with so-called perfect forward secrecy; we use
ECDHE@footnote{@uref{http://en.wikipedia.org/wiki/Elliptic_curve_Diffie%E2%80%93Hellman, Elliptic-curve Diffie---Hellman}}
powered by Curve25519
@footnote{@uref{http://cr.yp.to/ecdh.html, Curve25519}} for the key
exchange and then use symmetric encryption, encrypting with both AES-256
@footnote{@uref{http://en.wikipedia.org/wiki/Rijndael, AES-256}} and
Twofish @footnote{@uref{http://en.wikipedia.org/wiki/Twofish, Twofish}}
@item @uref{http://en.wikipedia.org/wiki/Authentication, authentication}
is achieved by signing the ephemeral keys using Ed25519
@footnote{@uref{http://ed25519.cr.yp.to/, Ed25519}}, a deterministic
variant of ECDSA
@footnote{@uref{http://en.wikipedia.org/wiki/ECDSA, ECDSA}}
@item integrity protection (using SHA-512
@footnote{@uref{http://en.wikipedia.org/wiki/SHA-2, SHA-512}} to do
encrypt-then-MAC
@footnote{@uref{http://en.wikipedia.org/wiki/Authenticated_encryption, encrypt-then-MAC}})
@item Replay
@footnote{@uref{http://en.wikipedia.org/wiki/Replay_attack, replay}}
protection (using nonces, timestamps, challenge-response,
message counters and ephemeral keys)
@item liveness (keep-alive messages, timeout)
@end itemize

@menu
* Limitations::
* When is a peer "connected"?::
* libgnunetcore::
* The CORE Client-Service Protocol::
* The CORE Peer-to-Peer Protocol::
@end menu

@cindex core subsystem limitations
@node Limitations
@subsection Limitations
@c %**end of header

CORE does not perform
@uref{http://en.wikipedia.org/wiki/Routing, routing}; using CORE it is
only possible to communicate with peers that happen to already be
"directly" connected with each other. CORE also does not have an
API to allow applications to establish such "direct" connections --- for
this, applications can ask TRANSPORT, but TRANSPORT might not be able to
establish a "direct" connection. The TOPOLOGY subsystem is responsible for
trying to keep a few "direct" connections open at all times. Applications
that need to talk to particular peers should use the CADET subsystem, as
it can establish arbitrary "indirect" connections.

Because CORE does not perform routing, CORE must only be used directly by
applications that either perform their own routing logic (such as
anonymous file-sharing) or that do not require routing, for example
because they are based on flooding the network. CORE communication is
unreliable and delivery is possibly out-of-order. Applications that
require reliable communication should use the CADET service. Each
application can only queue one message per target peer with the CORE
service at any time; messages cannot be larger than approximately
63 kilobytes. If messages are small, CORE may group multiple messages
(possibly from different applications) prior to encryption. If permitted
by the application (using the @uref{http://baus.net/on-tcp_cork/, cork}
option), CORE may delay transmissions to facilitate grouping of multiple
small messages. If cork is not enabled, CORE will transmit the message as
soon as TRANSPORT allows it (TRANSPORT is responsible for limiting
bandwidth and congestion control). CORE does not allow flow control;
applications are expected to process messages at line-speed. If flow
control is needed, applications should use the CADET service.

@cindex when is a peer connected
@node When is a peer "connected"?
@subsection When is a peer "connected"?
@c %**end of header

In addition to the security features mentioned above, CORE also provides
one additional key feature to applications using it, and that is a
limited form of protocol-compatibility checking. CORE distinguishes
between TRANSPORT-level connections (which enable communication with other
peers) and application-level connections. Applications using the CORE API
will (typically) learn about application-level connections from CORE, and
not about TRANSPORT-level connections. When a typical application uses
CORE, it will specify a set of message types
(from @code{gnunet_protocols.h}) that it understands. CORE will then
notify the application about connections it has with other peers if and
only if those applications registered an intersecting set of message
types with their CORE service. Thus, it is quite possible that CORE only
exposes a subset of the established direct connections to a particular
application --- and different applications running above CORE might see
different sets of connections at the same time.

A special case are applications that do not register a handler for any
message type.
CORE assumes that these applications merely want to monitor connections
(or "all" messages via other callbacks) and will notify those applications
about all connections. This is used, for example, by the
@code{gnunet-core} command-line tool to display the active connections.
Note that it is also possible that the TRANSPORT service has more active
connections than the CORE service, as the CORE service first has to
perform a key exchange with connecting peers before exchanging information
about supported message types and notifying applications about the new
connection.

@cindex libgnunetcore
@node libgnunetcore
@subsection libgnunetcore
@c %**end of header

The CORE API (defined in @file{gnunet_core_service.h}) is the basic
messaging API used by P2P applications built using GNUnet. It provides
applications the ability to send and receive encrypted messages to the
peer's "directly" connected neighbours.

As CORE connections are generally "direct" connections,@ applications must
not assume that they can connect to arbitrary peers this way, as "direct"
connections may not always be possible. Applications using CORE are
notified about which peers are connected. Creating new "direct"
connections must be done using the TRANSPORT API.

The CORE API provides unreliable, out-of-order delivery. While the
implementation tries to ensure timely, in-order delivery, both message
losses and reordering are not detected and must be tolerated by the
application. Most important, the core will NOT perform retransmission if
messages could not be delivered.

Note that CORE allows applications to queue one message per connected
peer. The rate at which each connection operates is influenced by the
preferences expressed by local application as well as restrictions
imposed by the other peer. Local applications can express their
preferences for particular connections using the "performance" API of the
ATS service.

Applications that require more sophisticated transmission capabilities
such as TCP-like behavior, or if you intend to send messages to arbitrary
remote peers, should use the CADET API.

The typical use of the CORE API is to connect to the CORE service using
@code{GNUNET_CORE_connect}, process events from the CORE service (such as
peers connecting, peers disconnecting and incoming messages) and send
messages to connected peers using
@code{GNUNET_CORE_notify_transmit_ready}. Note that applications must
cancel pending transmission requests if they receive a disconnect event
for a peer that had a transmission pending; furthermore, queueing more
than one transmission request per peer per application using the
service is not permitted.

The CORE API also allows applications to monitor all communications of the
peer prior to encryption (for outgoing messages) or after decryption (for
incoming messages). This can be useful for debugging, diagnostics or to
establish the presence of cover traffic (for anonymity). As monitoring
applications are often not interested in the payload, the monitoring
callbacks can be configured to only provide the message headers (including
the message type and size) instead of copying the full data stream to the
monitoring client.

The init callback of the @code{GNUNET_CORE_connect} function is called
with the hash of the public key of the peer. This public key is used to
identify the peer globally in the GNUnet network. Applications are
encouraged to check that the provided hash matches the hash that they are
using (as theoretically the application may be using a different
configuration file with a different private key, which would result in
hard to find bugs).

As with most service APIs, the CORE API isolates applications from crashes
of the CORE service. If the CORE service crashes, the application will see
disconnect events for all existing connections. Once the connections are
re-established, the applications will be receive matching connect events.

@cindex core clinet-service protocol
@node The CORE Client-Service Protocol
@subsection The CORE Client-Service Protocol
@c %**end of header

This section describes the protocol between an application using the CORE
service (the client) and the CORE service process itself.


@menu
* Setup2::
* Notifications::
* Sending::
@end menu

@node Setup2
@subsubsection Setup2
@c %**end of header

When a client connects to the CORE service, it first sends a
@code{InitMessage} which specifies options for the connection and a set of
message type values which are supported by the application. The options
bitmask specifies which events the client would like to be notified about.
The options include:

@table @asis
@item GNUNET_CORE_OPTION_NOTHING No notifications
@item GNUNET_CORE_OPTION_STATUS_CHANGE Peers connecting and disconnecting
@item GNUNET_CORE_OPTION_FULL_INBOUND All inbound messages (after
decryption) with full payload
@item GNUNET_CORE_OPTION_HDR_INBOUND Just the @code{MessageHeader}
of all inbound messages
@item GNUNET_CORE_OPTION_FULL_OUTBOUND All outbound
messages (prior to encryption) with full payload
@item GNUNET_CORE_OPTION_HDR_OUTBOUND Just the @code{MessageHeader} of all
outbound messages
@end table

Typical applications will only monitor for connection status changes.

The CORE service responds to the @code{InitMessage} with an
@code{InitReplyMessage} which contains the peer's identity. Afterwards,
both CORE and the client can send messages.

@node Notifications
@subsubsection Notifications
@c %**end of header

The CORE will send @code{ConnectNotifyMessage}s and
@code{DisconnectNotifyMessage}s whenever peers connect or disconnect from
the CORE (assuming their type maps overlap with the message types
registered by the client). When the CORE receives a message that matches
the set of message types specified during the @code{InitMessage} (or if
monitoring is enabled in for inbound messages in the options), it sends a
@code{NotifyTrafficMessage} with the peer identity of the sender and the
decrypted payload. The same message format (except with
@code{GNUNET_MESSAGE_TYPE_CORE_NOTIFY_OUTBOUND} for the message type) is
used to notify clients monitoring outbound messages; here, the peer
identity given is that of the receiver.

@node Sending
@subsubsection Sending
@c %**end of header

When a client wants to transmit a message, it first requests a
transmission slot by sending a @code{SendMessageRequest} which specifies
the priority, deadline and size of the message. Note that these values
may be ignored by CORE. When CORE is ready for the message, it answers
with a @code{SendMessageReady} response. The client can then transmit the
payload with a @code{SendMessage} message. Note that the actual message
size in the @code{SendMessage} is allowed to be smaller than the size in
the original request. A client may at any time send a fresh
@code{SendMessageRequest}, which then superceeds the previous
@code{SendMessageRequest}, which is then no longer valid. The client can
tell which @code{SendMessageRequest} the CORE service's
@code{SendMessageReady} message is for as all of these messages contain a
"unique" request ID (based on a counter incremented by the client
for each request).

@cindex CORE Peer-to-Peer Protocol
@node The CORE Peer-to-Peer Protocol
@subsection The CORE Peer-to-Peer Protocol
@c %**end of header


@menu
* Creating the EphemeralKeyMessage::
* Establishing a connection::
* Encryption and Decryption::
* Type maps::
@end menu

@cindex EphemeralKeyMessage creation
@node Creating the EphemeralKeyMessage
@subsubsection Creating the EphemeralKeyMessage
@c %**end of header

When the CORE service starts, each peer creates a fresh ephemeral (ECC)
public-private key pair and signs the corresponding
@code{EphemeralKeyMessage} with its long-term key (which we usually call
the peer's identity; the hash of the public long term key is what results
in a @code{struct GNUNET_PeerIdentity} in all GNUnet APIs. The ephemeral
key is ONLY used for an ECDHE@footnote{@uref{http://en.wikipedia.org/wiki/Elliptic_curve_Diffie%E2%80%93Hellman, Elliptic-curve Diffie---Hellman}}
exchange by the CORE service to establish symmetric session keys. A peer
will use the same @code{EphemeralKeyMessage} for all peers for
@code{REKEY_FREQUENCY}, which is usually 12 hours. After that time, it
will create a fresh ephemeral key (forgetting the old one) and broadcast
the new @code{EphemeralKeyMessage} to all connected peers, resulting in
fresh symmetric session keys. Note that peers independently decide on
when to discard ephemeral keys; it is not a protocol violation to discard
keys more often. Ephemeral keys are also never stored to disk; restarting
a peer will thus always create a fresh ephemeral key. The use of ephemeral
keys is what provides @uref{http://en.wikipedia.org/wiki/Forward_secrecy, forward secrecy}.

Just before transmission, the @code{EphemeralKeyMessage} is patched to
reflect the current sender_status, which specifies the current state of
the connection from the point of view of the sender. The possible values
are:

@itemize @bullet
@item @code{KX_STATE_DOWN} Initial value, never used on the network
@item @code{KX_STATE_KEY_SENT} We sent our ephemeral key, do not know the
key of the other peer
@item @code{KX_STATE_KEY_RECEIVED} This peer has received a valid
ephemeral key of the other peer, but we are waiting for the other peer to
confirm it's authenticity (ability to decode) via challenge-response.
@item @code{KX_STATE_UP} The connection is fully up from the point of
view of the sender (now performing keep-alives)
@item @code{KX_STATE_REKEY_SENT} The sender has initiated a rekeying
operation; the other peer has so far failed to confirm a working
connection using the new ephemeral key
@end itemize

@node Establishing a connection
@subsubsection Establishing a connection
@c %**end of header

Peers begin their interaction by sending a @code{EphemeralKeyMessage} to
the other peer once the TRANSPORT service notifies the CORE service about
the connection.
A peer receiving an @code{EphemeralKeyMessage} with a status
indicating that the sender does not have the receiver's ephemeral key, the
receiver's @code{EphemeralKeyMessage} is sent in response.
Additionally, if the receiver has not yet confirmed the authenticity of
the sender, it also sends an (encrypted)@code{PingMessage} with a
challenge (and the identity of the target) to the other peer. Peers
receiving a @code{PingMessage} respond with an (encrypted)
@code{PongMessage} which includes the challenge. Peers receiving a
@code{PongMessage} check the challenge, and if it matches set the
connection to @code{KX_STATE_UP}.

@node Encryption and Decryption
@subsubsection Encryption and Decryption
@c %**end of header

All functions related to the key exchange and encryption/decryption of
messages can be found in @file{gnunet-service-core_kx.c} (except for the
cryptographic primitives, which are in @file{util/crypto*.c}).
Given the key material from ECDHE, a Key derivation function
@footnote{@uref{https://en.wikipedia.org/wiki/Key_derivation_function, Key derivation function}}
is used to derive two pairs of encryption and decryption keys for AES-256
and TwoFish, as well as initialization vectors and authentication keys
(for HMAC@footnote{@uref{https://en.wikipedia.org/wiki/HMAC, HMAC}}).
The HMAC is computed over the encrypted payload.
Encrypted messages include an iv_seed and the HMAC in the header.

Each encrypted message in the CORE service includes a sequence number and
a timestamp in the encrypted payload. The CORE service remembers the
largest observed sequence number and a bit-mask which represents which of
the previous 32 sequence numbers were already used.
Messages with sequence numbers lower than the largest observed sequence
number minus 32 are discarded. Messages with a timestamp that is less
than @code{REKEY_TOLERANCE} off (5 minutes) are also discarded. This of
course means that system clocks need to be reasonably synchronized for
peers to be able to communicate. Additionally, as the ephemeral key
changes every 12 hours, a peer would not even be able to decrypt messages
older than 12 hours.

@node Type maps
@subsubsection Type maps
@c %**end of header

Once an encrypted connection has been established, peers begin to exchange
type maps. Type maps are used to allow the CORE service to determine which
(encrypted) connections should be shown to which applications. A type map
is an array of 65536 bits representing the different types of messages
understood by applications using the CORE service. Each CORE service
maintains this map, simply by setting the respective bit for each message
type supported by any of the applications using the CORE service. Note
that bits for message types embedded in higher-level protocols (such as
MESH) will not be included in these type maps.

Typically, the type map of a peer will be sparse. Thus, the CORE service
attempts to compress its type map using @code{gzip}-style compression
("deflate") prior to transmission. However, if the compression fails to
compact the map, the map may also be transmitted without compression
(resulting in @code{GNUNET_MESSAGE_TYPE_CORE_COMPRESSED_TYPE_MAP} or
@code{GNUNET_MESSAGE_TYPE_CORE_BINARY_TYPE_MAP} messages respectively).
Upon receiving a type map, the respective CORE service notifies
applications about the connection to the other peer if they support any
message type indicated in the type map (or no message type at all).
If the CORE service experience a connect or disconnect event from an
application, it updates its type map (setting or unsetting the respective
bits) and notifies its neighbours about the change.
The CORE services of the neighbours then in turn generate connect and
disconnect events for the peer that sent the type map for their respective
applications. As CORE messages may be lost, the CORE service confirms
receiving a type map by sending back a
@code{GNUNET_MESSAGE_TYPE_CORE_CONFIRM_TYPE_MAP}. If such a confirmation
(with the correct hash of the type map) is not received, the sender will
retransmit the type map (with exponential back-off).

@cindex CADET Subsystem
@node CADET Subsystem
@section CADET Subsystem

The CADET subsystem in GNUnet is responsible for secure end-to-end
communications between nodes in the GNUnet overlay network. CADET builds
on the CORE subsystem which provides for the link-layer communication and
then adds routing, forwarding and additional security to the connections.
CADET offers the same cryptographic services as CORE, but on an
end-to-end level. This is done so peers retransmitting traffic on behalf
of other peers cannot access the payload data.

@itemize @bullet
@item CADET provides confidentiality with so-called perfect forward
secrecy; we use ECDHE powered by Curve25519 for the key exchange and then
use symmetric encryption, encrypting with both AES-256 and Twofish
@item authentication is achieved by signing the ephemeral keys using
Ed25519, a deterministic variant of ECDSA
@item integrity protection (using SHA-512 to do encrypt-then-MAC, although
only 256 bits are sent to reduce overhead)
@item replay protection (using nonces, timestamps, challenge-response,
message counters and ephemeral keys)
@item liveness (keep-alive messages, timeout)
@end itemize

Additional to the CORE-like security benefits, CADET offers other
properties that make it a more universal service than CORE.

@itemize @bullet
@item CADET can establish channels to arbitrary peers in GNUnet. If a
peer is not immediately reachable, CADET will find a path through the
network and ask other peers to retransmit the traffic on its behalf.
@item CADET offers (optional) reliability mechanisms. In a reliable
channel traffic is guaranteed to arrive complete, unchanged and in-order.
@item CADET takes care of flow and congestion control mechanisms, not
allowing the sender to send more traffic than the receiver or the network
are able to process.
@end itemize

@menu
* libgnunetcadet::
@end menu

@cindex libgnunetcadet
@node libgnunetcadet
@subsection libgnunetcadet


The CADET API (defined in @file{gnunet_cadet_service.h}) is the
messaging API used by P2P applications built using GNUnet.
It provides applications the ability to send and receive encrypted
messages to any peer participating in GNUnet.
The API is heavily base on the CORE API.

CADET delivers messages to other peers in "channels".
A channel is a permanent connection defined by a destination peer
(identified by its public key) and a port number.
Internally, CADET tunnels all channels towards a destiantion peer
using one session key and relays the data on multiple "connections",
independent from the channels.

Each channel has optional paramenters, the most important being the
reliability flag.
Should a message get lost on TRANSPORT/CORE level, if a channel is
created with as reliable, CADET will retransmit the lost message and
deliver it in order to the destination application.

To communicate with other peers using CADET, it is necessary to first
connect to the service using @code{GNUNET_CADET_connect}.
This function takes several parameters in form of callbacks, to allow the
client to react to various events, like incoming channels or channels that
terminate, as well as specify a list of ports the client wishes to listen
to (at the moment it is not possible to start listening on further ports
once connected, but nothing prevents a client to connect several times to
CADET, even do one connection per listening port).
The function returns a handle which has to be used for any further
interaction with the service.

To connect to a remote peer a client has to call the
@code{GNUNET_CADET_channel_create} function. The most important parameters
given are the remote peer's identity (it public key) and a port, which
specifies which application on the remote peer to connect to, similar to
TCP/UDP ports. CADET will then find the peer in the GNUnet network and
establish the proper low-level connections and do the necessary key
exchanges to assure and authenticated, secure and verified communication.
Similar to @code{GNUNET_CADET_connect},@code{GNUNET_CADET_create_channel}
returns a handle to interact with the created channel.

For every message the client wants to send to the remote application,
@code{GNUNET_CADET_notify_transmit_ready} must be called, indicating the
channel on which the message should be sent and the size of the message
(but not the message itself!). Once CADET is ready to send the message,
the provided callback will fire, and the message contents are provided to
this callback.

Please note the CADET does not provide an explicit notification of when a
channel is connected. In loosely connected networks, like big wireless
mesh networks, this can take several seconds, even minutes in the worst
case. To be alerted when a channel is online, a client can call
@code{GNUNET_CADET_notify_transmit_ready} immediately after
@code{GNUNET_CADET_create_channel}. When the callback is activated, it
means that the channel is online. The callback can give 0 bytes to CADET
if no message is to be sent, this is ok.

If a transmission was requested but before the callback fires it is no
longer needed, it can be cancelled with
@code{GNUNET_CADET_notify_transmit_ready_cancel}, which uses the handle
given back by @code{GNUNET_CADET_notify_transmit_ready}.
As in the case of CORE, only one message can be requested at a time: a
client must not call @code{GNUNET_CADET_notify_transmit_ready} again until
the callback is called or the request is cancelled.

When a channel is no longer needed, a client can call
@code{GNUNET_CADET_channel_destroy} to get rid of it.
Note that CADET will try to transmit all pending traffic before notifying
the remote peer of the destruction of the channel, including
retransmitting lost messages if the channel was reliable.

Incoming channels, channels being closed by the remote peer, and traffic
on any incoming or outgoing channels are given to the client when CADET
executes the callbacks given to it at the time of
@code{GNUNET_CADET_connect}.

Finally, when an application no longer wants to use CADET, it should call
@code{GNUNET_CADET_disconnect}, but first all channels and pending
transmissions must be closed (otherwise CADET will complain).

@cindex NSE Subsystem
@node NSE Subsystem
@section NSE Subsystem


NSE stands for @dfn{Network Size Estimation}. The NSE subsystem provides
other subsystems and users with a rough estimate of the number of peers
currently participating in the GNUnet overlay.
The computed value is not a precise number as producing a precise number
in a decentralized, efficient and secure way is impossible.
While NSE's estimate is inherently imprecise, NSE also gives the expected
range. For a peer that has been running in a stable network for a
while, the real network size will typically (99.7% of the time) be in the
range of [2/3 estimate, 3/2 estimate]. We will now give an overview of the
algorithm used to calculate the estimate;
all of the details can be found in this technical report.

@c FIXME: link to the report.

@menu
* Motivation::
* Principle::
* libgnunetnse::
* The NSE Client-Service Protocol::
* The NSE Peer-to-Peer Protocol::
@end menu

@node Motivation
@subsection Motivation


Some subsytems, like DHT, need to know the size of the GNUnet network to
optimize some parameters of their own protocol. The decentralized nature
of GNUnet makes efficient and securely counting the exact number of peers
infeasable. Although there are several decentralized algorithms to count
the number of peers in a system, so far there is none to do so securely.
Other protocols may allow any malicious peer to manipulate the final
result or to take advantage of the system to perform
@dfn{Denial of Service} (DoS) attacks against the network.
GNUnet's NSE protocol avoids these drawbacks.



@menu
* Security::
@end menu

@cindex NSE security
@cindex nse security
@node Security
@subsubsection Security


The NSE subsystem is designed to be resilient against these attacks.
It uses @uref{http://en.wikipedia.org/wiki/Proof-of-work_system, proofs of work}
to prevent one peer from impersonating a large number of participants,
which would otherwise allow an adversary to artifically inflate the
estimate.
The DoS protection comes from the time-based nature of the protocol:
the estimates are calculated periodically and out-of-time traffic is
either ignored or stored for later retransmission by benign peers.
In particular, peers cannot trigger global network communication at will.

@cindex NSE principle
@cindex nse principle
@node Principle
@subsection Principle


The algorithm calculates the estimate by finding the globally closest
peer ID to a random, time-based value.

The idea is that the closer the ID is to the random value, the more
"densely packed" the ID space is, and therefore, more peers are in the
network.



@menu
* Example::
* Algorithm::
* Target value::
* Timing::
* Controlled Flooding::
* Calculating the estimate::
@end menu

@node Example
@subsubsection Example


Suppose all peers have IDs between 0 and 100 (our ID space), and the
random value is 42.
If the closest peer has the ID 70 we can imagine that the average
"distance" between peers is around 30 and therefore the are around 3
peers in the whole ID space. On the other hand, if the closest peer has
the ID 44, we can imagine that the space is rather packed with peers,
maybe as much as 50 of them.
Naturally, we could have been rather unlucky, and there is only one peer
and happens to have the ID 44. Thus, the current estimate is calculated
as the average over multiple rounds, and not just a single sample.

@node Algorithm
@subsubsection Algorithm


Given that example, one can imagine that the job of the subsystem is to
efficiently communicate the ID of the closest peer to the target value
to all the other peers, who will calculate the estimate from it.

@node Target value
@subsubsection Target value

@c %**end of header

The target value itself is generated by hashing the current time, rounded
down to an agreed value. If the rounding amount is 1h (default) and the
time is 12:34:56, the time to hash would be 12:00:00. The process is
repeated each rouning amount (in this example would be every hour).
Every repetition is called a round.

@node Timing
@subsubsection Timing
@c %**end of header

The NSE subsystem has some timing control to avoid everybody broadcasting
its ID all at one. Once each peer has the target random value, it
compares its own ID to the target and calculates the hypothetical size of
the network if that peer were to be the closest.
Then it compares the hypothetical size with the estimate from the previous
rounds. For each value there is an assiciated point in the period,
let's call it "broadcast time". If its own hypothetical estimate
is the same as the previous global estimate, its "broadcast time" will be
in the middle of the round. If its bigger it will be earlier and if its
smaller (the most likely case) it will be later. This ensures that the
peers closests to the target value start broadcasting their ID the first.

@node Controlled Flooding
@subsubsection Controlled Flooding

@c %**end of header

When a peer receives a value, first it verifies that it is closer than the
closest value it had so far, otherwise it answers the incoming message
with a message containing the better value. Then it checks a proof of
work that must be included in the incoming message, to ensure that the
other peer's ID is not made up (otherwise a malicious peer could claim to
have an ID of exactly the target value every round). Once validated, it
compares the brodcast time of the received value with the current time
and if it's not too early, sends the received value to its neighbors.
Otherwise it stores the value until the correct broadcast time comes.
This prevents unnecessary traffic of sub-optimal values, since a better
value can come before the broadcast time, rendering the previous one
obsolete and saving the traffic that would have been used to broadcast it
to the neighbors.

@node Calculating the estimate
@subsubsection Calculating the estimate

@c %**end of header

Once the closest ID has been spread across the network each peer gets the
exact distance betweed this ID and the target value of the round and
calculates the estimate with a mathematical formula described in the tech
report. The estimate generated with this method for a single round is not
very precise. Remember the case of the example, where the only peer is the
ID 44 and we happen to generate the target value 42, thinking there are
50 peers in the network. Therefore, the NSE subsystem remembers the last
64 estimates and calculates an average over them, giving a result of which
usually has one bit of uncertainty (the real size could be half of the
estimate or twice as much). Note that the actual network size is
calculated in powers of two of the raw input, thus one bit of uncertainty
means a factor of two in the size estimate.

@cindex libgnunetnse
@node libgnunetnse
@subsection libgnunetnse

@c %**end of header

The NSE subsystem has the simplest API of all services, with only two
calls: @code{GNUNET_NSE_connect} and @code{GNUNET_NSE_disconnect}.

The connect call gets a callback function as a parameter and this function
is called each time the network agrees on an estimate. This usually is
once per round, with some exceptions: if the closest peer has a late
local clock and starts spreading his ID after everyone else agreed on a
value, the callback might be activated twice in a round, the second value
being always bigger than the first. The default round time is set to
1 hour.

The disconnect call disconnects from the NSE subsystem and the callback
is no longer called with new estimates.



@menu
* Results::
* libgnunetnse - Examples::
@end menu

@node Results
@subsubsection Results

@c %**end of header

The callback provides two values: the average and the
@uref{http://en.wikipedia.org/wiki/Standard_deviation, standard deviation}
of the last 64 rounds. The values provided by the callback function are
logarithmic, this means that the real estimate numbers can be obtained by
calculating 2 to the power of the given value (2average). From a
statistics point of view this means that:

@itemize @bullet
@item 68% of the time the real size is included in the interval
[(2average-stddev), 2]
@item 95% of the time the real size is included in the interval
[(2average-2*stddev, 2^average+2*stddev]
@item 99.7% of the time the real size is included in the interval
[(2average-3*stddev, 2average+3*stddev]
@end itemize

The expected standard variation for 64 rounds in a network of stable size
is 0.2. Thus, we can say that normally:

@itemize @bullet
@item 68% of the time the real size is in the range [-13%, +15%]
@item 95% of the time the real size is in the range [-24%, +32%]
@item 99.7% of the time the real size is in the range [-34%, +52%]
@end itemize

As said in the introduction, we can be quite sure that usually the real
size is between one third and three times the estimate. This can of
course vary with network conditions.
Thus, applications may want to also consider the provided standard
deviation value, not only the average (in particular, if the standard
veriation is very high, the average maybe meaningless: the network size is
changing rapidly).

@node libgnunetnse - Examples
@subsubsection libgnunetnse -Examples

@c %**end of header

Let's close with a couple examples.

@table @asis

@item Average: 10, std dev: 1 Here the estimate would be
2^10 = 1024 peers. @footnote{The range in which we can be 95% sure is:
[2^8, 2^12] = [256, 4096]. We can be very (>99.7%) sure that the network
is not a hundred peers and absolutely sure that it is not a million peers,
but somewhere around a thousand.}

@item Average 22, std dev: 0.2 Here the estimate would be
2^22 = 4 Million peers. @footnote{The range in which we can be 99.7% sure
is: [2^21.4, 2^22.6] = [2.8M, 6.3M]. We can be sure that the network size
is around four million, with absolutely way of it being 1 million.}

@end table

To put this in perspective, if someone remembers the LHC Higgs boson
results, were announced with "5 sigma" and "6 sigma" certainties. In this
case a 5 sigma minimum would be 2 million and a 6 sigma minimum,
1.8 million.

@node The NSE Client-Service Protocol
@subsection The NSE Client-Service Protocol

@c %**end of header

As with the API, the client-service protocol is very simple, only has 2
different messages, defined in @code{src/nse/nse.h}:

@itemize @bullet
@item @code{GNUNET_MESSAGE_TYPE_NSE_START}@ This message has no parameters
and is sent from the client to the service upon connection.
@item @code{GNUNET_MESSAGE_TYPE_NSE_ESTIMATE}@ This message is sent from
the service to the client for every new estimate and upon connection.
Contains a timestamp for the estimate, the average and the standard
deviation for the respective round.
@end itemize

When the @code{GNUNET_NSE_disconnect} API call is executed, the client
simply disconnects from the service, with no message involved.

@cindex NSE Peer-to-Peer Protocol
@node The NSE Peer-to-Peer Protocol
@subsection The NSE Peer-to-Peer Protocol

@c %**end of header

The NSE subsystem only has one message in the P2P protocol, the
@code{GNUNET_MESSAGE_TYPE_NSE_P2P_FLOOD} message.

This message key contents are the timestamp to identify the round
(differences in system clocks may cause some peers to send messages way
too early or way too late, so the timestamp allows other peers to
identify such messages easily), the
@uref{http://en.wikipedia.org/wiki/Proof-of-work_system, proof of work}
used to make it difficult to mount a
@uref{http://en.wikipedia.org/wiki/Sybil_attack, Sybil attack}, and the
public key, which is used to verify the signature on the message.

Every peer stores a message for the previous, current and next round. The
messages for the previous and current round are given to peers that
connect to us. The message for the next round is simply stored until our
system clock advances to the next round. The message for the current round
is what we are flooding the network with right now.
At the beginning of each round the peer does the following:

@itemize @bullet
@item calculates his own distance to the target value
@item creates, signs and stores the message for the current round (unless
it has a better message in the "next round" slot which came early in the
previous round)
@item calculates, based on the stored round message (own or received) when
to stard flooding it to its neighbors
@end itemize

Upon receiving a message the peer checks the validity of the message
(round, proof of work, signature). The next action depends on the
contents of the incoming message:

@itemize @bullet
@item if the message is worse than the current stored message, the peer
sends the current message back immediately, to stop the other peer from
spreading suboptimal results
@item if the message is better than the current stored message, the peer
stores the new message and calculates the new target time to start
spreading it to its neighbors (excluding the one the message came from)
@item if the message is for the previous round, it is compared to the
message stored in the "previous round slot", which may then be updated
@item if the message is for the next round, it is compared to the message
stored in the "next round slot", which again may then be updated
@end itemize

Finally, when it comes to send the stored message for the current round to
the neighbors there is a random delay added for each neighbor, to avoid
traffic spikes and minimize cross-messages.

@cindex HOSTLIST Subsystem
@node HOSTLIST Subsystem
@section HOSTLIST Subsystem

@c %**end of header

Peers in the GNUnet overlay network need address information so that they
can connect with other peers. GNUnet uses so called HELLO messages to
store and exchange peer addresses.
GNUnet provides several methods for peers to obtain this information:

@itemize @bullet
@item out-of-band exchange of HELLO messages (manually, using for example
gnunet-peerinfo)
@item HELLO messages shipped with GNUnet (automatic with distribution)
@item UDP neighbor discovery in LAN (IPv4 broadcast, IPv6 multicast)
@item topology gossiping (learning from other peers we already connected
to), and
@item the HOSTLIST daemon covered in this section, which is particularly
relevant for bootstrapping new peers.
@end itemize

New peers have no existing connections (and thus cannot learn from gossip
among peers), may not have other peers in their LAN and might be started
with an outdated set of HELLO messages from the distribution.
In this case, getting new peers to connect to the network requires either
manual effort or the use of a HOSTLIST to obtain HELLOs.

@menu
* HELLOs::
* Overview for the HOSTLIST subsystem::
* Interacting with the HOSTLIST daemon::
* Hostlist security address validation::
* The HOSTLIST daemon::
* The HOSTLIST server::
* The HOSTLIST client::
* Usage::
@end menu

@node HELLOs
@subsection HELLOs

@c %**end of header

The basic information peers require to connect to other peers are
contained in so called HELLO messages you can think of as a business card.
Besides the identity of the peer (based on the cryptographic public key) a
HELLO message may contain address information that specifies ways to
contact a peer. By obtaining HELLO messages, a peer can learn how to
contact other peers.

@node Overview for the HOSTLIST subsystem
@subsection Overview for the HOSTLIST subsystem

@c %**end of header

The HOSTLIST subsystem provides a way to distribute and obtain contact
information to connect to other peers using a simple HTTP GET request.
It's implementation is split in three parts, the main file for the daemon
itself (@file{gnunet-daemon-hostlist.c}), the HTTP client used to download
peer information (@file{hostlist-client.c}) and the server component used
to provide this information to other peers (@file{hostlist-server.c}).
The server is basically a small HTTP web server (based on GNU
libmicrohttpd) which provides a list of HELLOs known to the local peer for
download. The client component is basically a HTTP client
(based on libcurl) which can download hostlists from one or more websites.
The hostlist format is a binary blob containing a sequence of HELLO
messages. Note that any HTTP server can theoretically serve a hostlist,
the build-in hostlist server makes it simply convenient to offer this
service.


@menu
* Features::
* HOSTLIST - Limitations::
@end menu

@node Features
@subsubsection Features

@c %**end of header

The HOSTLIST daemon can:

@itemize @bullet
@item provide HELLO messages with validated addresses obtained from
PEERINFO to download for other peers
@item download HELLO messages and forward these message to the TRANSPORT
subsystem for validation
@item advertises the URL of this peer's hostlist address to other peers
via gossip
@item automatically learn about hostlist servers from the gossip of other
peers
@end itemize

@node HOSTLIST - Limitations
@subsubsection HOSTLIST - Limitations

@c %**end of header

The HOSTLIST daemon does not:

@itemize @bullet
@item verify the cryptographic information in the HELLO messages
@item verify the address information in the HELLO messages
@end itemize

@node Interacting with the HOSTLIST daemon
@subsection Interacting with the HOSTLIST daemon

@c %**end of header

The HOSTLIST subsystem is currently implemented as a daemon, so there is
no need for the user to interact with it and therefore there is no
command line tool and no API to communicate with the daemon. In the
future, we can envision changing this to allow users to manually trigger
the download of a hostlist.

Since there is no command line interface to interact with HOSTLIST, the
only way to interact with the hostlist is to use STATISTICS to obtain or
modify information about the status of HOSTLIST:

@example
$ gnunet-statistics -s hostlist
@end example

@noindent
In particular, HOSTLIST includes a @strong{persistent} value in statistics
that specifies when the hostlist server might be queried next. As this
value is exponentially increasing during runtime, developers may want to
reset or manually adjust it. Note that HOSTLIST (but not STATISTICS) needs
to be shutdown if changes to this value are to have any effect on the
daemon (as HOSTLIST does not monitor STATISTICS for changes to the
download frequency).

@node Hostlist security address validation
@subsection Hostlist security address validation

@c %**end of header

Since information obtained from other parties cannot be trusted without
validation, we have to distinguish between @emph{validated} and
@emph{not validated} addresses. Before using (and so trusting)
information from other parties, this information has to be double-checked
(validated). Address validation is not done by HOSTLIST but by the
TRANSPORT service.

The HOSTLIST component is functionally located between the PEERINFO and
the TRANSPORT subsystem. When acting as a server, the daemon obtains valid
(@emph{validated}) peer information (HELLO messages) from the PEERINFO
service and provides it to other peers. When acting as a client, it
contacts the HOSTLIST servers specified in the configuration, downloads
the (unvalidated) list of HELLO messages and forwards these information
to the TRANSPORT server to validate the addresses.

@cindex HOSTLIST daemon
@node The HOSTLIST daemon
@subsection The HOSTLIST daemon

@c %**end of header

The hostlist daemon is the main component of the HOSTLIST subsystem. It is
started by the ARM service and (if configured) starts the HOSTLIST client
and server components.

If the daemon provides a hostlist itself it can advertise it's own
hostlist to other peers. To do so it sends a
@code{GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT} message to other peers
when they connect to this peer on the CORE level. This hostlist
advertisement message contains the URL to access the HOSTLIST HTTP
server of the sender. The daemon may also subscribe to this type of
message from CORE service, and then forward these kind of message to the
HOSTLIST client. The client then uses all available URLs to download peer
information when necessary.

When starting, the HOSTLIST daemon first connects to the CORE subsystem
and if hostlist learning is enabled, registers a CORE handler to receive
this kind of messages. Next it starts (if configured) the client and
server. It passes pointers to CORE connect and disconnect and receive
handlers where the client and server store their functions, so the daemon
can notify them about CORE events.

To clean up on shutdown, the daemon has a cleaning task, shutting down all
subsystems and disconnecting from CORE.

@cindex HOSTLIST server
@node The HOSTLIST server
@subsection The HOSTLIST server

@c %**end of header

The server provides a way for other peers to obtain HELLOs. Basically it
is a small web server other peers can connect to and download a list of
HELLOs using standard HTTP; it may also advertise the URL of the hostlist
to other peers connecting on CORE level.


@menu
* The HTTP Server::
* Advertising the URL::
@end menu

@node The HTTP Server
@subsubsection The HTTP Server

@c %**end of header

During startup, the server starts a web server listening on the port
specified with the HTTPPORT value (default 8080). In addition it connects
to the PEERINFO service to obtain peer information. The HOSTLIST server
uses the GNUNET_PEERINFO_iterate function to request HELLO information for
all peers and adds their information to a new hostlist if they are
suitable (expired addresses and HELLOs without addresses are both not
suitable) and the maximum size for a hostlist is not exceeded
(MAX_BYTES_PER_HOSTLISTS = 500000).
When PEERINFO finishes (with a last NULL callback), the server destroys
the previous hostlist response available for download on the web server
and replaces it with the updated hostlist. The hostlist format is
basically a sequence of HELLO messages (as obtained from PEERINFO) without
any special tokenization. Since each HELLO message contains a size field,
the response can easily be split into separate HELLO messages by the
client.

A HOSTLIST client connecting to the HOSTLIST server will receive the
hostlist as a HTTP response and the the server will terminate the
connection with the result code @code{HTTP 200 OK}.
The connection will be closed immediately if no hostlist is available.

@node Advertising the URL
@subsubsection Advertising the URL

@c %**end of header

The server also advertises the URL to download the hostlist to other peers
if hostlist advertisement is enabled.
When a new peer connects and has hostlist learning enabled, the server
sends a @code{GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT} message to this
peer using the CORE service.

@cindex HOSTLIST client
@node The HOSTLIST client
@subsection The HOSTLIST client

@c %**end of header

The client provides the functionality to download the list of HELLOs from
a set of URLs.
It performs a standard HTTP request to the URLs configured and learned
from advertisement messages received from other peers. When a HELLO is
downloaded, the HOSTLIST client forwards the HELLO to the TRANSPORT
service for validation.

The client supports two modes of operation:

@itemize @bullet
@item download of HELLOs (bootstrapping)
@item learning of URLs
@end itemize

@menu
* Bootstrapping::
* Learning::
@end menu

@node Bootstrapping
@subsubsection Bootstrapping

@c %**end of header

For bootstrapping, it schedules a task to download the hostlist from the
set of known URLs.
The downloads are only performed if the number of current
connections is smaller than a minimum number of connections
(at the moment 4).
The interval between downloads increases exponentially; however, the
exponential growth is limited if it becomes longer than an hour.
At that point, the frequency growth is capped at
(#number of connections * 1h).

Once the decision has been taken to download HELLOs, the daemon chooses a
random URL from the list of known URLs. URLs can be configured in the
configuration or be learned from advertisement messages.
The client uses a HTTP client library (libcurl) to initiate the download
using the libcurl multi interface.
Libcurl passes the data to the callback_download function which
stores the data in a buffer if space is available and the maximum size for
a hostlist download is not exceeded (MAX_BYTES_PER_HOSTLISTS = 500000).
When a full HELLO was downloaded, the HOSTLIST client offers this
HELLO message to the TRANSPORT service for validation.
When the download is finished or failed, statistical information about the
quality of this URL is updated.

@cindex HOSTLIST learning
@node Learning
@subsubsection Learning

@c %**end of header

The client also manages hostlist advertisements from other peers. The
HOSTLIST daemon forwards @code{GNUNET_MESSAGE_TYPE_HOSTLIST_ADVERTISEMENT}
messages to the client subsystem, which extracts the URL from the message.
Next, a test of the newly obtained URL is performed by triggering a
download from the new URL. If the URL works correctly, it is added to the
list of working URLs.

The size of the list of URLs is restricted, so if an additional server is
added and the list is full, the URL with the worst quality ranking
(determined through successful downloads and number of HELLOs e.g.) is
discarded. During shutdown the list of URLs is saved to a file for
persistance and loaded on startup. URLs from the configuration file are
never discarded.

@node Usage
@subsection Usage

@c %**end of header

To start HOSTLIST by default, it has to be added to the DEFAULTSERVICES
section for the ARM services. This is done in the default configuration.

For more information on how to configure the HOSTLIST subsystem see the
installation handbook:@
Configuring the hostlist to bootstrap@
Configuring your peer to provide a hostlist

@cindex IDENTITY Subsystem
@node IDENTITY Subsystem
@section IDENTITY Subsystem

@c %**end of header

Identities of "users" in GNUnet are called egos.
Egos can be used as pseudonyms ("fake names") or be tied to an
organization (for example, "GNU") or even the actual identity of a human.
GNUnet users are expected to have many egos. They might have one tied to
their real identity, some for organizations they manage, and more for
different domains where they want to operate under a pseudonym.

The IDENTITY service allows users to manage their egos. The identity
service manages the private keys egos of the local user; it does not
manage identities of other users (public keys). Public keys for other
users need names to become manageable. GNUnet uses the
@dfn{GNU Name System} (GNS) to give names to other users and manage their
public keys securely. This chapter is about the IDENTITY service,
which is about the management of private keys.

On the network, an ego corresponds to an ECDSA key (over Curve25519,
using RFC 6979, as required by GNS). Thus, users can perform actions
under a particular ego by using (signing with) a particular private key.
Other users can then confirm that the action was really performed by that
ego by checking the signature against the respective public key.

The IDENTITY service allows users to associate a human-readable name with
each ego. This way, users can use names that will remind them of the
purpose of a particular ego.
The IDENTITY service will store the respective private keys and
allows applications to access key information by name.
Users can change the name that is locally (!) associated with an ego.
Egos can also be deleted, which means that the private key will be removed
and it thus will not be possible to perform actions with that ego in the
future.

Additionally, the IDENTITY subsystem can associate service functions with
egos.
For example, GNS requires the ego that should be used for the shorten
zone. GNS will ask IDENTITY for an ego for the "gns-short" service.
The IDENTITY service has a mapping of such service strings to the name of
the ego that the user wants to use for this service, for example
"my-short-zone-ego".

Finally, the IDENTITY API provides access to a special ego, the
anonymous ego. The anonymous ego is special in that its private key is not
really private, but fixed and known to everyone.
Thus, anyone can perform actions as anonymous. This can be useful as with
this trick, code does not have to contain a special case to distinguish
between anonymous and pseudonymous egos.

@menu
* libgnunetidentity::
* The IDENTITY Client-Service Protocol::
@end menu

@cindex libgnunetidentity
@node libgnunetidentity
@subsection libgnunetidentity
@c %**end of header


@menu
* Connecting to the service::
* Operations on Egos::
* The anonymous Ego::
* Convenience API to lookup a single ego::
* Associating egos with service functions::
@end menu

@node Connecting to the service
@subsubsection Connecting to the service

@c %**end of header

First, typical clients connect to the identity service using
@code{GNUNET_IDENTITY_connect}. This function takes a callback as a
parameter.
If the given callback parameter is non-null, it will be invoked to notify
the application about the current state of the identities in the system.

@itemize @bullet
@item First, it will be invoked on all known egos at the time of the
connection. For each ego, a handle to the ego and the user's name for the
ego will be passed to the callback. Furthermore, a @code{void **} context
argument will be provided which gives the client the opportunity to
associate some state with the ego.
@item Second, the callback will be invoked with NULL for the ego, the name
and the context. This signals that the (initial) iteration over all egos
has completed.
@item Then, the callback will be invoked whenever something changes about
an ego.
If an ego is renamed, the callback is invoked with the ego handle of the
ego that was renamed, and the new name. If an ego is deleted, the callback
is invoked with the ego handle and a name of NULL. In the deletion case,
the application should also release resources stored in the context.
@item When the application destroys the connection to the identity service
using @code{GNUNET_IDENTITY_disconnect}, the callback is again invoked
with the ego and a name of NULL (equivalent to deletion of the egos).
This should again be used to clean up the per-ego context.
@end itemize

The ego handle passed to the callback remains valid until the callback is
invoked with a name of NULL, so it is safe to store a reference to the
ego's handle.

@node Operations on Egos
@subsubsection Operations on Egos

@c %**end of header

Given an ego handle, the main operations are to get its associated private
key using @code{GNUNET_IDENTITY_ego_get_private_key} or its associated
public key using @code{GNUNET_IDENTITY_ego_get_public_key}.

The other operations on egos are pretty straightforward.
Using @code{GNUNET_IDENTITY_create}, an application can request the
creation of an ego by specifying the desired name.
The operation will fail if that name is
already in use. Using @code{GNUNET_IDENTITY_rename} the name of an
existing ego can be changed. Finally, egos can be deleted using
@code{GNUNET_IDENTITY_delete}. All of these operations will trigger
updates to the callback given to the @code{GNUNET_IDENTITY_connect}
function of all applications that are connected with the identity service
at the time. @code{GNUNET_IDENTITY_cancel} can be used to cancel the
operations before the respective continuations would be called.
It is not guaranteed that the operation will not be completed anyway,
only the continuation will no longer be called.

@node The anonymous Ego
@subsubsection The anonymous Ego

@c %**end of header

A special way to obtain an ego handle is to call
@code{GNUNET_IDENTITY_ego_get_anonymous}, which returns an ego for the
"anonymous" user --- anyone knows and can get the private key for this
user, so it is suitable for operations that are supposed to be anonymous
but require signatures (for example, to avoid a special path in the code).
The anonymous ego is always valid and accessing it does not require a
connection to the identity service.

@node Convenience API to lookup a single ego
@subsubsection Convenience API to lookup a single ego


As applications commonly simply have to lookup a single ego, there is a
convenience API to do just that. Use @code{GNUNET_IDENTITY_ego_lookup} to
lookup a single ego by name. Note that this is the user's name for the
ego, not the service function. The resulting ego will be returned via a
callback and will only be valid during that callback. The operation can
be cancelled via @code{GNUNET_IDENTITY_ego_lookup_cancel}
(cancellation is only legal before the callback is invoked).

@node Associating egos with service functions
@subsubsection Associating egos with service functions


The @code{GNUNET_IDENTITY_set} function is used to associate a particular
ego with a service function. The name used by the service and the ego are
given as arguments.
Afterwards, the service can use its name to lookup the associated ego
using @code{GNUNET_IDENTITY_get}.

@node The IDENTITY Client-Service Protocol
@subsection The IDENTITY Client-Service Protocol

@c %**end of header

A client connecting to the identity service first sends a message with
type
@code{GNUNET_MESSAGE_TYPE_IDENTITY_START} to the service. After that, the
client will receive information about changes to the egos by receiving
messages of type @code{GNUNET_MESSAGE_TYPE_IDENTITY_UPDATE}.
Those messages contain the private key of the ego and the user's name of
the ego (or zero bytes for the name to indicate that the ego was deleted).
A special bit @code{end_of_list} is used to indicate the end of the
initial iteration over the identity service's egos.

The client can trigger changes to the egos by sending @code{CREATE},
@code{RENAME} or @code{DELETE} messages.
The CREATE message contains the private key and the desired name.@
The RENAME message contains the old name and the new name.@
The DELETE message only needs to include the name of the ego to delete.@
The service responds to each of these messages with a @code{RESULT_CODE}
message which indicates success or error of the operation, and possibly
a human-readable error message.

Finally, the client can bind the name of a service function to an ego by
sending a @code{SET_DEFAULT} message with the name of the service function
and the private key of the ego.
Such bindings can then be resolved using a @code{GET_DEFAULT} message,
which includes the name of the service function. The identity service
will respond to a GET_DEFAULT request with a SET_DEFAULT message
containing the respective information, or with a RESULT_CODE to
indicate an error.

@cindex NAMESTORE Subsystem
@node NAMESTORE Subsystem
@section NAMESTORE Subsystem

The NAMESTORE subsystem provides persistent storage for local GNS zone
information. All local GNS zone information are managed by NAMESTORE. It
provides both the functionality to administer local GNS information (e.g.
delete and add records) as well as to retrieve GNS information (e.g to
list name information in a client).
NAMESTORE does only manage the persistent storage of zone information
belonging to the user running the service: GNS information from other
users obtained from the DHT are stored by the NAMECACHE subsystem.

NAMESTORE uses a plugin-based database backend to store GNS information
with good performance. Here sqlite, MySQL and PostgreSQL are supported
database backends.
NAMESTORE clients interact with the IDENTITY subsystem to obtain
cryptographic information about zones based on egos as described with the
IDENTITY subsystem, but internally NAMESTORE refers to zones using the
ECDSA private key.
In addition, it collaborates with the NAMECACHE subsystem and
stores zone information when local information are modified in the
GNS cache to increase look-up performance for local information.

NAMESTORE provides functionality to look-up and store records, to iterate
over a specific or all zones and to monitor zones for changes. NAMESTORE
functionality can be accessed using the NAMESTORE api or the NAMESTORE
command line tool.

@menu
* libgnunetnamestore::
@end menu

@cindex libgnunetnamestore
@node libgnunetnamestore
@subsection libgnunetnamestore

To interact with NAMESTORE clients first connect to the NAMESTORE service
using the @code{GNUNET_NAMESTORE_connect} passing a configuration handle.
As a result they obtain a NAMESTORE handle, they can use for operations,
or NULL is returned if the connection failed.

To disconnect from NAMESTORE, clients use
@code{GNUNET_NAMESTORE_disconnect} and specify the handle to disconnect.

NAMESTORE internally uses the ECDSA private key to refer to zones. These
private keys can be obtained from the IDENTITY subsytem.
Here @emph{egos} @emph{can be used to refer to zones or the default ego
assigned to the GNS subsystem can be used to obtained the master zone's
private key.}


@menu
* Editing Zone Information::
* Iterating Zone Information::
* Monitoring Zone Information::
@end menu

@node Editing Zone Information
@subsubsection Editing Zone Information

@c %**end of header

NAMESTORE provides functions to lookup records stored under a label in a
zone and to store records under a label in a zone.

To store (and delete) records, the client uses the
@code{GNUNET_NAMESTORE_records_store} function and has to provide
namestore handle to use, the private key of the zone, the label to store
the records under, the records and number of records plus an callback
function.
After the operation is performed NAMESTORE will call the provided
callback function with the result GNUNET_SYSERR on failure
(including timeout/queue drop/failure to validate), GNUNET_NO if content
was already there or not found GNUNET_YES (or other positive value) on
success plus an additional error message.

Records are deleted by using the store command with 0 records to store.
It is important to note, that records are not merged when records exist
with the label.
So a client has first to retrieve records, merge with existing records
and then store the result.

To perform a lookup operation, the client uses the
@code{GNUNET_NAMESTORE_records_store} function. Here he has to pass the
namestore handle, the private key of the zone and the label. He also has
to provide a callback function which will be called with the result of
the lookup operation:
the zone for the records, the label, and the records including the
number of records included.

A special operation is used to set the preferred nickname for a zone.
This nickname is stored with the zone and is automatically merged with
all labels and records stored in a zone. Here the client uses the
@code{GNUNET_NAMESTORE_set_nick} function and passes the private key of
the zone, the nickname as string plus a the callback with the result of
the operation.

@node Iterating Zone Information
@subsubsection Iterating Zone Information

@c %**end of header

A client can iterate over all information in a zone or all zones managed
by NAMESTORE.
Here a client uses the @code{GNUNET_NAMESTORE_zone_iteration_start}
function and passes the namestore handle, the zone to iterate over and a
callback function to call with the result.
If the client wants to iterate over all the, he passes NULL for the zone.
A @code{GNUNET_NAMESTORE_ZoneIterator} handle is returned to be used to
continue iteration.

NAMESTORE calls the callback for every result and expects the client to
call @code{GNUNET_NAMESTORE_zone_iterator_next} to continue to iterate or
@code{GNUNET_NAMESTORE_zone_iterator_stop} to interrupt the iteration.
When NAMESTORE reached the last item it will call the callback with a
NULL value to indicate.

@node Monitoring Zone Information
@subsubsection Monitoring Zone Information

@c %**end of header

Clients can also monitor zones to be notified about changes. Here the
clients uses the @code{GNUNET_NAMESTORE_zone_monitor_start} function and
passes the private key of the zone and and a callback function to call
with updates for a zone.
The client can specify to obtain zone information first by iterating over
the zone and specify a synchronization callback to be called when the
client and the namestore are synced.

On an update, NAMESTORE will call the callback with the private key of the
zone, the label and the records and their number.

To stop monitoring, the client calls
@code{GNUNET_NAMESTORE_zone_monitor_stop} and passes the handle obtained
from the function to start the monitoring.

@cindex PEERINFO Subsystem
@node PEERINFO Subsystem
@section PEERINFO Subsystem

@c %**end of header

The PEERINFO subsystem is used to store verified (validated) information
about known peers in a persistent way. It obtains these addresses for
example from TRANSPORT service which is in charge of address validation.
Validation means that the information in the HELLO message are checked by
connecting to the addresses and performing a cryptographic handshake to
authenticate the peer instance stating to be reachable with these
addresses.
Peerinfo does not validate the HELLO messages itself but only stores them
and gives them to interested clients.

As future work, we think about moving from storing just HELLO messages to
providing a generic persistent per-peer information store.
More and more subsystems tend to need to store per-peer information in
persistent way.
To not duplicate this functionality we plan to provide a PEERSTORE
service providing this functionality.

@menu
* PEERINFO - Features::
* PEERINFO - Limitations::
* DeveloperPeer Information::
* Startup::
* Managing Information::
* Obtaining Information::
* The PEERINFO Client-Service Protocol::
* libgnunetpeerinfo::
@end menu

@node PEERINFO - Features
@subsection PEERINFO - Features

@c %**end of header

@itemize @bullet
@item Persistent storage
@item Client notification mechanism on update
@item Periodic clean up for expired information
@item Differentiation between public and friend-only HELLO
@end itemize

@node PEERINFO - Limitations
@subsection PEERINFO - Limitations


@itemize @bullet
@item Does not perform HELLO validation
@end itemize

@node DeveloperPeer Information
@subsection DeveloperPeer Information

@c %**end of header

The PEERINFO subsystem stores these information in the form of HELLO
messages you can think of as business cards.
These HELLO messages contain the public key of a peer and the addresses
a peer can be reached under.
The addresses include an expiration date describing how long they are
valid. This information is updated regularly by the TRANSPORT service by
revalidating the address.
If an address is expired and not renewed, it can be removed from the
HELLO message.

Some peer do not want to have their HELLO messages distributed to other
peers, especially when GNUnet's friend-to-friend modus is enabled.
To prevent this undesired distribution. PEERINFO distinguishes between
@emph{public} and @emph{friend-only} HELLO messages.
Public HELLO messages can be freely distributed to other (possibly
unknown) peers (for example using the hostlist, gossiping, broadcasting),
whereas friend-only HELLO messages may not be distributed to other peers.
Friend-only HELLO messages have an additional flag @code{friend_only} set
internally. For public HELLO message this flag is not set.
PEERINFO does and cannot not check if a client is allowed to obtain a
specific HELLO type.

The HELLO messages can be managed using the GNUnet HELLO library.
Other GNUnet systems can obtain these information from PEERINFO and use
it for their purposes.
Clients are for example the HOSTLIST component providing these
information to other peers in form of a hostlist or the TRANSPORT
subsystem using these information to maintain connections to other peers.

@node Startup
@subsection Startup

@c %**end of header

During startup the PEERINFO services loads persistent HELLOs from disk.
First PEERINFO parses the directory configured in the HOSTS value of the
@code{PEERINFO} configuration section to store PEERINFO information.
For all files found in this directory valid HELLO messages are extracted.
In addition it loads HELLO messages shipped with the GNUnet distribution.
These HELLOs are used to simplify network bootstrapping by providing
valid peer information with the distribution.
The use of these HELLOs can be prevented by setting the
@code{USE_INCLUDED_HELLOS} in the @code{PEERINFO} configuration section to
@code{NO}. Files containing invalid information are removed.

@node Managing Information
@subsection Managing Information

@c %**end of header

The PEERINFO services stores information about known PEERS and a single
HELLO message for every peer.
A peer does not need to have a HELLO if no information are available.
HELLO information from different sources, for example a HELLO obtained
from a remote HOSTLIST and a second HELLO stored on disk, are combined
and merged into one single HELLO message per peer which will be given to
clients. During this merge process the HELLO is immediately written to
disk to ensure persistence.

PEERINFO in addition periodically scans the directory where information
are stored for empty HELLO messages with expired TRANSPORT addresses.
This periodic task scans all files in the directory and recreates the
HELLO messages it finds.
Expired TRANSPORT addresses are removed from the HELLO and if the
HELLO does not contain any valid addresses, it is discarded and removed
from the disk.

@node Obtaining Information
@subsection Obtaining Information

@c %**end of header

When a client requests information from PEERINFO, PEERINFO performs a
lookup for the respective peer or all peers if desired and transmits this
information to the client.
The client can specify if friend-only HELLOs have to be included or not
and PEERINFO filters the respective HELLO messages before transmitting
information.

To notify clients about changes to PEERINFO information, PEERINFO
maintains a list of clients interested in this notifications.
Such a notification occurs if a HELLO for a peer was updated (due to a
merge for example) or a new peer was added.

@node The PEERINFO Client-Service Protocol
@subsection The PEERINFO Client-Service Protocol

@c %**end of header

To connect and disconnect to and from the PEERINFO Service PEERINFO
utilizes the util client/server infrastructure, so no special messages
types are used here.

To add information for a peer, the plain HELLO message is transmitted to
the service without any wrapping. All pieces of information required are
stored within the HELLO message.
The PEERINFO service provides a message handler accepting and processing
these HELLO messages.

When obtaining PEERINFO information using the iterate functionality
specific messages are used. To obtain information for all peers, a
@code{struct ListAllPeersMessage} with message type
@code{GNUNET_MESSAGE_TYPE_PEERINFO_GET_ALL} and a flag
include_friend_only to indicate if friend-only HELLO messages should be
included are transmitted. If information for a specific peer is required
a @code{struct ListAllPeersMessage} with
@code{GNUNET_MESSAGE_TYPE_PEERINFO_GET} containing the peer identity is
used.

For both variants the PEERINFO service replies for each HELLO message it
wants to transmit with a @code{struct ListAllPeersMessage} with type
@code{GNUNET_MESSAGE_TYPE_PEERINFO_INFO} containing the plain HELLO.
The final message is @code{struct GNUNET_MessageHeader} with type
@code{GNUNET_MESSAGE_TYPE_PEERINFO_INFO}. If the client receives this
message, it can proceed with the next request if any is pending.

@node libgnunetpeerinfo
@subsection libgnunetpeerinfo

@c %**end of header

The PEERINFO API consists mainly of three different functionalities:

@itemize @bullet
@item maintaining a connection to the service
@item adding new information to the PEERINFO service
@item retrieving information from the PEERINFO service
@end itemize

@menu
* Connecting to the PEERINFO Service::
* Adding Information to the PEERINFO Service::
* Obtaining Information from the PEERINFO Service::
@end menu

@node Connecting to the PEERINFO Service
@subsubsection Connecting to the PEERINFO Service

@c %**end of header

To connect to the PEERINFO service the function
@code{GNUNET_PEERINFO_connect} is used, taking a configuration handle as
an argument, and to disconnect from PEERINFO the function
@code{GNUNET_PEERINFO_disconnect}, taking the PEERINFO
handle returned from the connect function has to be called.

@node Adding Information to the PEERINFO Service
@subsubsection Adding Information to the PEERINFO Service

@c %**end of header

@code{GNUNET_PEERINFO_add_peer} adds a new peer to the PEERINFO subsystem
storage. This function takes the PEERINFO handle as an argument, the HELLO
message to store and a continuation with a closure to be called with the
result of the operation.
The @code{GNUNET_PEERINFO_add_peer} returns a handle to this operation
allowing to cancel the operation with the respective cancel function
@code{GNUNET_PEERINFO_add_peer_cancel}. To retrieve information from
PEERINFO you can iterate over all information stored with PEERINFO or you
can tell PEERINFO to notify if new peer information are available.

@node Obtaining Information from the PEERINFO Service
@subsubsection Obtaining Information from the PEERINFO Service

@c %**end of header

To iterate over information in PEERINFO you use
@code{GNUNET_PEERINFO_iterate}.
This function expects the PEERINFO handle, a flag if HELLO messages
intended for friend only mode should be included, a timeout how long the
operation should take and a callback with a callback closure to be called
for the results.
If you want to obtain information for a specific peer, you can specify
the peer identity, if this identity is NULL, information for all peers are
returned. The function returns a handle to allow to cancel the operation
using @code{GNUNET_PEERINFO_iterate_cancel}.

To get notified when peer information changes, you can use
@code{GNUNET_PEERINFO_notify}.
This function expects a configuration handle and a flag if friend-only
HELLO messages should be included. The PEERINFO service will notify you
about every change and the callback function will be called to notify you
about changes. The function returns a handle to cancel notifications
with @code{GNUNET_PEERINFO_notify_cancel}.

@cindex PEERSTORE Subsystem
@node PEERSTORE Subsystem
@section PEERSTORE Subsystem

@c %**end of header

GNUnet's PEERSTORE subsystem offers persistent per-peer storage for other
GNUnet subsystems. GNUnet subsystems can use PEERSTORE to persistently
store and retrieve arbitrary data.
Each data record stored with PEERSTORE contains the following fields:

@itemize @bullet
@item subsystem: Name of the subsystem responsible for the record.
@item peerid: Identity of the peer this record is related to.
@item key: a key string identifying the record.
@item value: binary record value.
@item expiry: record expiry date.
@end itemize

@menu
* Functionality::
* Architecture::
* libgnunetpeerstore::
@end menu

@node Functionality
@subsection Functionality

@c %**end of header

Subsystems can store any type of value under a (subsystem, peerid, key)
combination. A "replace" flag set during store operations forces the
PEERSTORE to replace any old values stored under the same
(subsystem, peerid, key) combination with the new value.
Additionally, an expiry date is set after which the record is *possibly*
deleted by PEERSTORE.

Subsystems can iterate over all values stored under any of the following
combination of fields:

@itemize @bullet
@item (subsystem)
@item (subsystem, peerid)
@item (subsystem, key)
@item (subsystem, peerid, key)
@end itemize

Subsystems can also request to be notified about any new values stored
under a (subsystem, peerid, key) combination by sending a "watch"
request to PEERSTORE.

@node Architecture
@subsection Architecture

@c %**end of header

PEERSTORE implements the following components:

@itemize @bullet
@item PEERSTORE service: Handles store, iterate and watch operations.
@item PEERSTORE API: API to be used by other subsystems to communicate and
issue commands to the PEERSTORE service.
@item PEERSTORE plugins: Handles the persistent storage. At the moment,
only an "sqlite" plugin is implemented.
@end itemize

@cindex libgnunetpeerstore
@node libgnunetpeerstore
@subsection libgnunetpeerstore

@c %**end of header

libgnunetpeerstore is the library containing the PEERSTORE API. Subsystems
wishing to communicate with the PEERSTORE service use this API to open a
connection to PEERSTORE. This is done by calling
@code{GNUNET_PEERSTORE_connect} which returns a handle to the newly
created connection.
This handle has to be used with any further calls to the API.

To store a new record, the function @code{GNUNET_PEERSTORE_store} is to
be used which requires the record fields and a continuation function that
will be called by the API after the STORE request is sent to the
PEERSTORE service.
Note that calling the continuation function does not mean that the record
is successfully stored, only that the STORE request has been successfully
sent to the PEERSTORE service.
@code{GNUNET_PEERSTORE_store_cancel} can be called to cancel the STORE
request only before the continuation function has been called.

To iterate over stored records, the function
@code{GNUNET_PEERSTORE_iterate} is
to be used. @emph{peerid} and @emph{key} can be set to NULL. An iterator
callback function will be called with each matching record found and a
NULL record at the end to signal the end of result set.
@code{GNUNET_PEERSTORE_iterate_cancel} can be used to cancel the ITERATE
request before the iterator callback is called with a NULL record.

To be notified with new values stored under a (subsystem, peerid, key)
combination, the function @code{GNUNET_PEERSTORE_watch} is to be used.
This will register the watcher with the PEERSTORE service, any new
records matching the given combination will trigger the callback
function passed to @code{GNUNET_PEERSTORE_watch}. This continues until
@code{GNUNET_PEERSTORE_watch_cancel} is called or the connection to the
service is destroyed.

After the connection is no longer needed, the function
@code{GNUNET_PEERSTORE_disconnect} can be called to disconnect from the
PEERSTORE service.
Any pending ITERATE or WATCH requests will be destroyed.
If the @code{sync_first} flag is set to @code{GNUNET_YES}, the API will
delay the disconnection until all pending STORE requests are sent to
the PEERSTORE service, otherwise, the pending STORE requests will be
destroyed as well.

@cindex SET Subsystem
@node SET Subsystem
@section SET Subsystem

@c %**end of header

The SET service implements efficient set operations between two peers
over a mesh tunnel.
Currently, set union and set intersection are the only supported
operations. Elements of a set consist of an @emph{element type} and
arbitrary binary @emph{data}.
The size of an element's data is limited to around 62 KB.

@menu
* Local Sets::
* Set Modifications::
* Set Operations::
* Result Elements::
* libgnunetset::
* The SET Client-Service Protocol::
* The SET Intersection Peer-to-Peer Protocol::
* The SET Union Peer-to-Peer Protocol::
@end menu

@node Local Sets
@subsection Local Sets

@c %**end of header

Sets created by a local client can be modified and reused for multiple
operations. As each set operation requires potentially expensive special
auxilliary data to be computed for each element of a set, a set can only
participate in one type of set operation (i.e. union or intersection).
The type of a set is determined upon its creation.
If a the elements of a set are needed for an operation of a different
type, all of the set's element must be copied to a new set of appropriate
type.

@node Set Modifications
@subsection Set Modifications

@c %**end of header

Even when set operations are active, one can add to and remove elements
from a set.
However, these changes will only be visible to operations that have been
created after the changes have taken place. That is, every set operation
only sees a snapshot of the set from the time the operation was started.
This mechanism is @emph{not} implemented by copying the whole set, but by
attaching @emph{generation information} to each element and operation.

@node Set Operations
@subsection Set Operations

@c %**end of header

Set operations can be started in two ways: Either by accepting an
operation request from a remote peer, or by requesting a set operation
from a remote peer.
Set operations are uniquely identified by the involved @emph{peers}, an
@emph{application id} and the @emph{operation type}.

The client is notified of incoming set operations by @emph{set listeners}.
A set listener listens for incoming operations of a specific operation
type and application id.
Once notified of an incoming set request, the client can accept the set
request (providing a local set for the operation) or reject it.

@node Result Elements
@subsection Result Elements

@c %**end of header

The SET service has three @emph{result modes} that determine how an
operation's result set is delivered to the client:

@itemize @bullet
@item @strong{Full Result Set.} All elements of set resulting from the set
operation are returned to the client.
@item @strong{Added Elements.} Only elements that result from the
operation and are not already in the local peer's set are returned.
Note that for some operations (like set intersection) this result mode
will never return any elements.
This can be useful if only the remove peer is actually interested in
the result of the set operation.
@item @strong{Removed Elements.} Only elements that are in the local
peer's initial set but not in the operation's result set are returned.
Note that for some operations (like set union) this result mode will
never return any elements. This can be useful if only the remove peer is
actually interested in the result of the set operation.
@end itemize

@cindex libgnunetset
@node libgnunetset
@subsection libgnunetset

@c %**end of header

@menu
* Sets::
* Listeners::
* Operations::
* Supplying a Set::
* The Result Callback::
@end menu

@node Sets
@subsubsection Sets

@c %**end of header

New sets are created with @code{GNUNET_SET_create}. Both the local peer's
configuration (as each set has its own client connection) and the
operation type must be specified.
The set exists until either the client calls @code{GNUNET_SET_destroy} or
the client's connection to the service is disrupted.
In the latter case, the client is notified by the return value of
functions dealing with sets. This return value must always be checked.

Elements are added and removed with @code{GNUNET_SET_add_element} and
@code{GNUNET_SET_remove_element}.

@node Listeners
@subsubsection Listeners

@c %**end of header

Listeners are created with @code{GNUNET_SET_listen}. Each time time a
remote peer suggests a set operation with an application id and operation
type matching a listener, the listener's callback is invoked.
The client then must synchronously call either @code{GNUNET_SET_accept}
or @code{GNUNET_SET_reject}. Note that the operation will not be started
until the client calls @code{GNUNET_SET_commit}
(see Section "Supplying a Set").

@node Operations
@subsubsection Operations

@c %**end of header

Operations to be initiated by the local peer are created with
@code{GNUNET_SET_prepare}. Note that the operation will not be started
until the client calls @code{GNUNET_SET_commit}
(see Section "Supplying a Set").

@node Supplying a Set
@subsubsection Supplying a Set

@c %**end of header

To create symmetry between the two ways of starting a set operation
(accepting and nitiating it), the operation handles returned by
@code{GNUNET_SET_accept} and @code{GNUNET_SET_prepare} do not yet have a
set to operate on, thus they can not do any work yet.

The client must call @code{GNUNET_SET_commit} to specify a set to use for
an operation. @code{GNUNET_SET_commit} may only be called once per set
operation.

@node The Result Callback
@subsubsection The Result Callback

@c %**end of header

Clients must specify both a result mode and a result callback with
@code{GNUNET_SET_accept} and @code{GNUNET_SET_prepare}. The result
callback with a status indicating either that an element was received, or
the operation failed or succeeded.
The interpretation of the received element depends on the result mode.
The callback needs to know which result mode it is used in, as the
arguments do not indicate if an element is part of the full result set,
or if it is in the difference between the original set and the final set.

@node The SET Client-Service Protocol
@subsection The SET Client-Service Protocol

@c %**end of header

@menu
* Creating Sets::
* Listeners2::
* Initiating Operations::
* Modifying Sets::
* Results and Operation Status::
* Iterating Sets::
@end menu

@node Creating Sets
@subsubsection Creating Sets

@c %**end of header

For each set of a client, there exists a client connection to the service.
Sets are created by sending the @code{GNUNET_SERVICE_SET_CREATE} message
over a new client connection. Multiple operations for one set are
multiplexed over one client connection, using a request id supplied by
the client.

@node Listeners2
@subsubsection Listeners2

@c %**end of header

Each listener also requires a seperate client connection. By sending the
@code{GNUNET_SERVICE_SET_LISTEN} message, the client notifies the service
of the application id and operation type it is interested in. A client
rejects an incoming request by sending @code{GNUNET_SERVICE_SET_REJECT}
on the listener's client connection.
In contrast, when accepting an incoming request, a
@code{GNUNET_SERVICE_SET_ACCEPT} message must be sent over the@ set that
is supplied for the set operation.

@node Initiating Operations
@subsubsection Initiating Operations

@c %**end of header

Operations with remote peers are initiated by sending a
@code{GNUNET_SERVICE_SET_EVALUATE} message to the service. The@ client
connection that this message is sent by determines the set to use.

@node Modifying Sets
@subsubsection Modifying Sets

@c %**end of header

Sets are modified with the @code{GNUNET_SERVICE_SET_ADD} and
@code{GNUNET_SERVICE_SET_REMOVE} messages.


@c %@menu
@c %* Results and Operation Status::
@c %* Iterating Sets::
@c %@end menu   

@node Results and Operation Status
@subsubsection Results and Operation Status
@c %**end of header

The service notifies the client of result elements and success/failure of
a set operation with the @code{GNUNET_SERVICE_SET_RESULT} message.

@node Iterating Sets
@subsubsection Iterating Sets

@c %**end of header

All elements of a set can be requested by sending
@code{GNUNET_SERVICE_SET_ITER_REQUEST}. The server responds with
@code{GNUNET_SERVICE_SET_ITER_ELEMENT} and eventually terminates the
iteration with @code{GNUNET_SERVICE_SET_ITER_DONE}.
After each received element, the client
must send @code{GNUNET_SERVICE_SET_ITER_ACK}. Note that only one set
iteration may be active for a set at any given time.

@node The SET Intersection Peer-to-Peer Protocol
@subsection The SET Intersection Peer-to-Peer Protocol

@c %**end of header

The intersection protocol operates over CADET and starts with a
GNUNET_MESSAGE_TYPE_SET_P2P_OPERATION_REQUEST being sent by the peer
initiating the operation to the peer listening for inbound requests.
It includes the number of elements of the initiating peer, which is used
to decide which side will send a Bloom filter first.

The listening peer checks if the operation type and application
identifier are acceptable for its current state.
If not, it responds with a GNUNET_MESSAGE_TYPE_SET_RESULT and a status of
GNUNET_SET_STATUS_FAILURE (and terminates the CADET channel).

If the application accepts the request, the listener sends back a
@code{GNUNET_MESSAGE_TYPE_SET_INTERSECTION_P2P_ELEMENT_INFO} if it has
more elements in the set than the client.
Otherwise, it immediately starts with the Bloom filter exchange.
If the initiator receives a
@code{GNUNET_MESSAGE_TYPE_SET_INTERSECTION_P2P_ELEMENT_INFO} response,
it beings the Bloom filter exchange, unless the set size is indicated to
be zero, in which case the intersection is considered finished after
just the initial handshake.


@menu
* The Bloom filter exchange::
* Salt::
@end menu

@node The Bloom filter exchange
@subsubsection The Bloom filter exchange

@c %**end of header

In this phase, each peer transmits a Bloom filter over the remaining
keys of the local set to the other peer using a
@code{GNUNET_MESSAGE_TYPE_SET_INTERSECTION_P2P_BF} message. This
message additionally includes the number of elements left in the sender's
set, as well as the XOR over all of the keys in that set.

The number of bits 'k' set per element in the Bloom filter is calculated
based on the relative size of the two sets.
Furthermore, the size of the Bloom filter is calculated based on 'k' and
the number of elements in the set to maximize the amount of data filtered
per byte transmitted on the wire (while avoiding an excessively high
number of iterations).

The receiver of the message removes all elements from its local set that
do not pass the Bloom filter test.
It then checks if the set size of the sender and the XOR over the keys
match what is left of his own set. If they do, he sends a
@code{GNUNET_MESSAGE_TYPE_SET_INTERSECTION_P2P_DONE} back to indicate
that the latest set is the final result.
Otherwise, the receiver starts another Bloom fitler exchange, except
this time as the sender.

@node Salt
@subsubsection Salt

@c %**end of header

Bloomfilter operations are probablistic: With some non-zero probability
the test may incorrectly say an element is in the set, even though it is
not.

To mitigate this problem, the intersection protocol iterates exchanging
Bloom filters using a different random 32-bit salt in each iteration (the
salt is also included in the message).
With different salts, set operations may fail for different elements.
Merging the results from the executions, the probability of failure drops
to zero.

The iterations terminate once both peers have established that they have
sets of the same size, and where the XOR over all keys computes the same
512-bit value (leaving a failure probability of 2-511).

@node The SET Union Peer-to-Peer Protocol
@subsection The SET Union Peer-to-Peer Protocol

@c %**end of header

The SET union protocol is based on Eppstein's efficient set reconciliation
without prior context. You should read this paper first if you want to
understand the protocol.

The union protocol operates over CADET and starts with a
GNUNET_MESSAGE_TYPE_SET_P2P_OPERATION_REQUEST being sent by the peer
initiating the operation to the peer listening for inbound requests.
It includes the number of elements of the initiating peer, which is
currently not used.

The listening peer checks if the operation type and application
identifier are acceptable for its current state. If not, it responds with
a @code{GNUNET_MESSAGE_TYPE_SET_RESULT} and a status of
@code{GNUNET_SET_STATUS_FAILURE} (and terminates the CADET channel).

If the application accepts the request, it sends back a strata estimator
using a message of type GNUNET_MESSAGE_TYPE_SET_UNION_P2P_SE. The
initiator evaluates the strata estimator and initiates the exchange of
invertible Bloom filters, sending a GNUNET_MESSAGE_TYPE_SET_UNION_P2P_IBF.

During the IBF exchange, if the receiver cannot invert the Bloom filter or
detects a cycle, it sends a larger IBF in response (up to a defined
maximum limit; if that limit is reached, the operation fails).
Elements decoded while processing the IBF are transmitted to the other
peer using GNUNET_MESSAGE_TYPE_SET_P2P_ELEMENTS, or requested from the
other peer using GNUNET_MESSAGE_TYPE_SET_P2P_ELEMENT_REQUESTS messages,
depending on the sign observed during decoding of the IBF.
Peers respond to a GNUNET_MESSAGE_TYPE_SET_P2P_ELEMENT_REQUESTS message
with the respective element in a GNUNET_MESSAGE_TYPE_SET_P2P_ELEMENTS
message. If the IBF fully decodes, the peer responds with a
GNUNET_MESSAGE_TYPE_SET_UNION_P2P_DONE message instead of another
GNUNET_MESSAGE_TYPE_SET_UNION_P2P_IBF.

All Bloom filter operations use a salt to mingle keys before hasing them
into buckets, such that future iterations have a fresh chance of
succeeding if they failed due to collisions before.

@cindex STATISTICS Subsystem
@node STATISTICS Subsystem
@section STATISTICS Subsystem

@c %**end of header

In GNUnet, the STATISTICS subsystem offers a central place for all
subsystems to publish unsigned 64-bit integer run-time statistics.
Keeping this information centrally means that there is a unified way for
the user to obtain data on all subsystems, and individual subsystems do
not have to always include a custom data export method for performance
metrics and other statistics. For example, the TRANSPORT system uses
STATISTICS to update information about the number of directly connected
peers and the bandwidth that has been consumed by the various plugins.
This information is valuable for diagnosing connectivity and performance
issues.

Following the GNUnet service architecture, the STATISTICS subsystem is
divided into an API which is exposed through the header
@strong{gnunet_statistics_service.h} and the STATISTICS service
@strong{gnunet-service-statistics}. The @strong{gnunet-statistics}
command-line tool can be used to obtain (and change) information about
the values stored by the STATISTICS service. The STATISTICS service does
not communicate with other peers.

Data is stored in the STATISTICS service in the form of tuples
@strong{(subsystem, name, value, persistence)}. The subsystem determines
to which other GNUnet's subsystem the data belongs. name is the name
through which value is associated. It uniquely identifies the record
from among other records belonging to the same subsystem.
In some parts of the code, the pair @strong{(subsystem, name)} is called
a @strong{statistic} as it identifies the values stored in the STATISTCS
service.The persistence flag determines if the record has to be preserved
across service restarts. A record is said to be persistent if this flag
is set for it; if not, the record is treated as a non-persistent record
and it is lost after service restart. Persistent records are written to
and read from the file @strong{statistics.data} before shutdown
and upon startup. The file is located in the HOME directory of the peer.

An anomaly of the STATISTICS service is that it does not terminate
immediately upon receiving a shutdown signal if it has any clients
connected to it. It waits for all the clients that are not monitors to
close their connections before terminating itself.
This is to prevent the loss of data during peer shutdown --- delaying the
STATISTICS service shutdown helps other services to store important data
to STATISTICS during shutdown.

@menu
* libgnunetstatistics::
* The STATISTICS Client-Service Protocol::
@end menu

@cindex libgnunetstatistics
@node libgnunetstatistics
@subsection libgnunetstatistics

@c %**end of header

@strong{libgnunetstatistics} is the library containing the API for the
STATISTICS subsystem. Any process requiring to use STATISTICS should use
this API by to open a connection to the STATISTICS service.
This is done by calling the function @code{GNUNET_STATISTICS_create()}.
This function takes the subsystem's name which is trying to use STATISTICS
and a configuration.
All values written to STATISTICS with this connection will be placed in
the section corresponding to the given subsystem's name.
The connection to STATISTICS can be destroyed with the function
@code{GNUNET_STATISTICS_destroy()}. This function allows for the
connection to be destroyed immediately or upon transferring all
pending write requests to the service.

Note: STATISTICS subsystem can be disabled by setting @code{DISABLE = YES}
under the @code{[STATISTICS]} section in the configuration. With such a
configuration all calls to @code{GNUNET_STATISTICS_create()} return
@code{NULL} as the STATISTICS subsystem is unavailable and no other
functions from the API can be used.


@menu
* Statistics retrieval::
* Setting statistics and updating them::
* Watches::
@end menu

@node Statistics retrieval
@subsubsection Statistics retrieval

@c %**end of header

Once a connection to the statistics service is obtained, information
about any other system which uses statistics can be retrieved with the
function GNUNET_STATISTICS_get().
This function takes the connection handle, the name of the subsystem
whose information we are interested in (a @code{NULL} value will
retrieve information of all available subsystems using STATISTICS), the
name of the statistic we are interested in (a @code{NULL} value will
retrieve all available statistics), a continuation callback which is
called when all of requested information is retrieved, an iterator
callback which is called for each parameter in the retrieved information
and a closure for the aforementioned callbacks. The library then invokes
the iterator callback for each value matching the request.

Call to @code{GNUNET_STATISTICS_get()} is asynchronous and can be
canceled with the function @code{GNUNET_STATISTICS_get_cancel()}.
This is helpful when retrieving statistics takes too long and especially
when we want to shutdown and cleanup everything.

@node Setting statistics and updating them
@subsubsection Setting statistics and updating them

@c %**end of header

So far we have seen how to retrieve statistics, here we will learn how we
can set statistics and update them so that other subsystems can retrieve
them.

A new statistic can be set using the function
@code{GNUNET_STATISTICS_set()}.
This function takes the name of the statistic and its value and a flag to
make the statistic persistent.
The value of the statistic should be of the type @code{uint64_t}.
The function does not take the name of the subsystem; it is determined
from the previous @code{GNUNET_STATISTICS_create()} invocation. If
the given statistic is already present, its value is overwritten.

An existing statistics can be updated, i.e its value can be increased or
decreased by an amount with the function
@code{GNUNET_STATISTICS_update()}.
The parameters to this function are similar to
@code{GNUNET_STATISTICS_set()}, except that it takes the amount to be
changed as a type @code{int64_t} instead of the value.

The library will combine multiple set or update operations into one
message if the client performs requests at a rate that is faster than the
available IPC with the STATISTICS service. Thus, the client does not have
to worry about sending requests too quickly.

@node Watches
@subsubsection Watches

@c %**end of header

As interesting feature of STATISTICS lies in serving notifications
whenever a statistic of our interest is modified.
This is achieved by registering a watch through the function
@code{GNUNET_STATISTICS_watch()}.
The parameters of this function are similar to those of
@code{GNUNET_STATISTICS_get()}.
Changes to the respective statistic's value will then cause the given
iterator callback to be called.
Note: A watch can only be registered for a specific statistic. Hence
the subsystem name and the parameter name cannot be @code{NULL} in a
call to @code{GNUNET_STATISTICS_watch()}.

A registered watch will keep notifying any value changes until
@code{GNUNET_STATISTICS_watch_cancel()} is called with the same
parameters that are used for registering the watch.

@node The STATISTICS Client-Service Protocol
@subsection The STATISTICS Client-Service Protocol
@c %**end of header


@menu
* Statistics retrieval2::
* Setting and updating statistics::
* Watching for updates::
@end menu

@node Statistics retrieval2
@subsubsection Statistics retrieval2

@c %**end of header

To retrieve statistics, the client transmits a message of type
@code{GNUNET_MESSAGE_TYPE_STATISTICS_GET} containing the given subsystem
name and statistic parameter to the STATISTICS service.
The service responds with a message of type
@code{GNUNET_MESSAGE_TYPE_STATISTICS_VALUE} for each of the statistics
parameters that match the client request for the client. The end of
information retrieved is signaled by the service by sending a message of
type @code{GNUNET_MESSAGE_TYPE_STATISTICS_END}.

@node Setting and updating statistics
@subsubsection Setting and updating statistics

@c %**end of header

The subsystem name, parameter name, its value and the persistence flag are
communicated to the service through the message
@code{GNUNET_MESSAGE_TYPE_STATISTICS_SET}.

When the service receives a message of type
@code{GNUNET_MESSAGE_TYPE_STATISTICS_SET}, it retrieves the subsystem
name and checks for a statistic parameter with matching the name given in
the message.
If a statistic parameter is found, the value is overwritten by the new
value from the message; if not found then a new statistic parameter is
created with the given name and value.

In addition to just setting an absolute value, it is possible to perform a
relative update by sending a message of type
@code{GNUNET_MESSAGE_TYPE_STATISTICS_SET} with an update flag
(@code{GNUNET_STATISTICS_SETFLAG_RELATIVE}) signifying that the value in
the message should be treated as an update value.

@node Watching for updates
@subsubsection Watching for updates

@c %**end of header

The function registers the watch at the service by sending a message of
type @code{GNUNET_MESSAGE_TYPE_STATISTICS_WATCH}. The service then sends
notifications through messages of type
@code{GNUNET_MESSAGE_TYPE_STATISTICS_WATCH_VALUE} whenever the statistic
parameter's value is changed.

@cindex DHT
@cindex Distributed Hash Table
@node Distributed Hash Table (DHT)
@section Distributed Hash Table (DHT)

@c %**end of header

GNUnet includes a generic distributed hash table that can be used by
developers building P2P applications in the framework.
This section documents high-level features and how developers are
expected to use the DHT.
We have a research paper detailing how the DHT works.
Also, Nate's thesis includes a detailed description and performance
analysis (in chapter 6).

Key features of GNUnet's DHT include:

@itemize @bullet
@item stores key-value pairs with values up to (approximately) 63k in size
@item works with many underlay network topologies (small-world, random
graph), underlay does not need to be a full mesh / clique
@item support for extended queries (more than just a simple 'key'),
filtering duplicate replies within the network (bloomfilter) and content
validation (for details, please read the subsection on the block library)
@item can (optionally) return paths taken by the PUT and GET operations
to the application
@item provides content replication to handle churn
@end itemize

GNUnet's DHT is randomized and unreliable. Unreliable means that there is
no strict guarantee that a value stored in the DHT is always
found --- values are only found with high probability.
While this is somewhat true in all P2P DHTs, GNUnet developers should be
particularly wary of this fact (this will help you write secure,
fault-tolerant code). Thus, when writing any application using the DHT,
you should always consider the possibility that a value stored in the
DHT by you or some other peer might simply not be returned, or returned
with a significant delay.
Your application logic must be written to tolerate this (naturally, some
loss of performance or quality of service is expected in this case).

@menu
* Block library and plugins::
* libgnunetdht::
* The DHT Client-Service Protocol::
* The DHT Peer-to-Peer Protocol::
@end menu

@node Block library and plugins
@subsection Block library and plugins

@c %**end of header

@menu
* What is a Block?::
* The API of libgnunetblock::
* Queries::
* Sample Code::
* Conclusion2::
@end menu

@node What is a Block?
@subsubsection What is a Block?

@c %**end of header

Blocks are small (< 63k) pieces of data stored under a key (struct
GNUNET_HashCode). Blocks have a type (enum GNUNET_BlockType) which defines
their data format. Blocks are used in GNUnet as units of static data
exchanged between peers and stored (or cached) locally.
Uses of blocks include file-sharing (the files are broken up into blocks),
the VPN (DNS information is stored in blocks) and the DHT (all
information in the DHT and meta-information for the maintenance of the
DHT are both stored using blocks).
The block subsystem provides a few common functions that must be
available for any type of block.

@cindex libgnunetblock API
@node The API of libgnunetblock
@subsubsection The API of libgnunetblock

@c %**end of header

The block library requires for each (family of) block type(s) a block
plugin (implementing @file{gnunet_block_plugin.h}) that provides basic
functions that are needed by the DHT (and possibly other subsystems) to
manage the block.
These block plugins are typically implemented within their respective
subsystems.
The main block library is then used to locate, load and query the
appropriate block plugin.
Which plugin is appropriate is determined by the block type (which is
just a 32-bit integer). Block plugins contain code that specifies which
block types are supported by a given plugin. The block library loads all
block plugins that are installed at the local peer and forwards the
application request to the respective plugin.

The central functions of the block APIs (plugin and main library) are to
allow the mapping of blocks to their respective key (if possible) and the
ability to check that a block is well-formed and matches a given
request (again, if possible).
This way, GNUnet can avoid storing invalid blocks, storing blocks under
the wrong key and forwarding blocks in response to a query that they do
not answer.

One key function of block plugins is that it allows GNUnet to detect
duplicate replies (via the Bloom filter). All plugins MUST support
detecting duplicate replies (by adding the current response to the
Bloom filter and rejecting it if it is encountered again).
If a plugin fails to do this, responses may loop in the network.

@node Queries
@subsubsection Queries
@c %**end of header

The query format for any block in GNUnet consists of four main components.
First, the type of the desired block must be specified. Second, the query
must contain a hash code. The hash code is used for lookups in hash
tables and databases and must not be unique for the block (however, if
possible a unique hash should be used as this would be best for
performance).
Third, an optional Bloom filter can be specified to exclude known results;
replies that hash to the bits set in the Bloom filter are considered
invalid. False-positives can be eliminated by sending the same query
again with a different Bloom filter mutator value, which parameterizes
the hash function that is used.
Finally, an optional application-specific "eXtended query" (xquery) can
be specified to further constrain the results. It is entirely up to
the type-specific plugin to determine whether or not a given block
matches a query (type, hash, Bloom filter, and xquery).
Naturally, not all xquery's are valid and some types of blocks may not
support Bloom filters either, so the plugin also needs to check if the
query is valid in the first place.

Depending on the results from the plugin, the DHT will then discard the
(invalid) query, forward the query, discard the (invalid) reply, cache the
(valid) reply, and/or forward the (valid and non-duplicate) reply.

@node Sample Code
@subsubsection Sample Code

@c %**end of header

The source code in @strong{plugin_block_test.c} is a good starting point
for new block plugins --- it does the minimal work by implementing a
plugin that performs no validation at all.
The respective @strong{Makefile.am} shows how to build and install a
block plugin.

@node Conclusion2
@subsubsection Conclusion2

@c %**end of header

In conclusion, GNUnet subsystems that want to use the DHT need to define a
block format and write a plugin to match queries and replies. For testing,
the @code{GNUNET_BLOCK_TYPE_TEST} block type can be used; it accepts
any query as valid and any reply as matching any query.
This type is also used for the DHT command line tools.
However, it should NOT be used for normal applications due to the lack
of error checking that results from this primitive implementation.

@cindex libgnunetdht
@node libgnunetdht
@subsection libgnunetdht

@c %**end of header

The DHT API itself is pretty simple and offers the usual GET and PUT
functions that work as expected. The specified block type refers to the
block library which allows the DHT to run application-specific logic for
data stored in the network.


@menu
* GET::
* PUT::
* MONITOR::
* DHT Routing Options::
@end menu

@node GET
@subsubsection GET

@c %**end of header

When using GET, the main consideration for developers (other than the
block library) should be that after issuing a GET, the DHT will
continuously cause (small amounts of) network traffic until the operation
is explicitly canceled.
So GET does not simply send out a single network request once; instead,
the DHT will continue to search for data. This is needed to achieve good
success rates and also handles the case where the respective PUT
operation happens after the GET operation was started.
Developers should not cancel an existing GET operation and then
explicitly re-start it to trigger a new round of network requests;
this is simply inefficient, especially as the internal automated version
can be more efficient, for example by filtering results in the network
that have already been returned.

If an application that performs a GET request has a set of replies that it
already knows and would like to filter, it can call@
@code{GNUNET_DHT_get_filter_known_results} with an array of hashes over
the respective blocks to tell the DHT that these results are not
desired (any more).
This way, the DHT will filter the respective blocks using the block
library in the network, which may result in a significant reduction in
bandwidth consumption.

@node PUT
@subsubsection PUT

@c %**end of header

In contrast to GET operations, developers @strong{must} manually re-run
PUT operations periodically (if they intend the content to continue to be
available). Content stored in the DHT expires or might be lost due to
churn.
Furthermore, GNUnet's DHT typically requires multiple rounds of PUT
operations before a key-value pair is consistently available to all
peers (the DHT randomizes paths and thus storage locations, and only
after multiple rounds of PUTs there will be a sufficient number of
replicas in large DHTs). An explicit PUT operation using the DHT API will
only cause network traffic once, so in order to ensure basic availability
and resistance to churn (and adversaries), PUTs must be repeated.
While the exact frequency depends on the application, a rule of thumb is
that there should be at least a dozen PUT operations within the content
lifetime. Content in the DHT typically expires after one day, so
DHT PUT operations should be repeated at least every 1-2 hours.

@node MONITOR
@subsubsection MONITOR

@c %**end of header

The DHT API also allows applications to monitor messages crossing the
local DHT service.
The types of messages used by the DHT are GET, PUT and RESULT messages.
Using the monitoring API, applications can choose to monitor these
requests, possibly limiting themselves to requests for a particular block
type.

The monitoring API is not only usefu only for diagnostics, it can also be
used to trigger application operations based on PUT operations.
For example, an application may use PUTs to distribute work requests to
other peers.
The workers would then monitor for PUTs that give them work, instead of
looking for work using GET operations.
This can be beneficial, especially if the workers have no good way to
guess the keys under which work would be stored.
Naturally, additional protocols might be needed to ensure that the desired
number of workers will process the distributed workload.

@node DHT Routing Options
@subsubsection DHT Routing Options

@c %**end of header

There are two important options for GET and PUT requests:

@table @asis
@item GNUNET_DHT_RO_DEMULITPLEX_EVERYWHERE This option means that all
peers should process the request, even if their peer ID is not closest to
the key. For a PUT request, this means that all peers that a request
traverses may make a copy of the data.
Similarly for a GET request, all peers will check their local database
for a result. Setting this option can thus significantly improve caching
and reduce bandwidth consumption --- at the expense of a larger DHT
database. If in doubt, we recommend that this option should be used.
@item GNUNET_DHT_RO_RECORD_ROUTE This option instructs the DHT to record
the path that a GET or a PUT request is taking through the overlay
network. The resulting paths are then returned to the application with
the respective result. This allows the receiver of a result to construct
a path to the originator of the data, which might then be used for
routing. Naturally, setting this option requires additional bandwidth
and disk space, so applications should only set this if the paths are
needed by the application logic.
@item GNUNET_DHT_RO_FIND_PEER This option is an internal option used by
the DHT's peer discovery mechanism and should not be used by applications.
@item GNUNET_DHT_RO_BART This option is currently not implemented. It may
in the future offer performance improvements for clique topologies.
@end table

@node The DHT Client-Service Protocol
@subsection The DHT Client-Service Protocol

@c %**end of header

@menu
* PUTting data into the DHT::
* GETting data from the DHT::
* Monitoring the DHT::
@end menu

@node PUTting data into the DHT
@subsubsection PUTting data into the DHT

@c %**end of header

To store (PUT) data into the DHT, the client sends a
@code{struct GNUNET_DHT_ClientPutMessage} to the service.
This message specifies the block type, routing options, the desired
replication level, the expiration time, key,
value and a 64-bit unique ID for the operation. The service responds with
a @code{struct GNUNET_DHT_ClientPutConfirmationMessage} with the same
64-bit unique ID. Note that the service sends the confirmation as soon as
it has locally processed the PUT request. The PUT may still be
propagating through the network at this time.

In the future, we may want to change this to provide (limited) feedback
to the client, for example if we detect that the PUT operation had no
effect because the same key-value pair was already stored in the DHT.
However, changing this would also require additional state and messages
in the P2P interaction.

@node GETting data from the DHT
@subsubsection GETting data from the DHT

@c %**end of header

To retrieve (GET) data from the DHT, the client sends a
@code{struct GNUNET_DHT_ClientGetMessage} to the service. The message
specifies routing options, a replication level (for replicating the GET,
not the content), the desired block type, the key, the (optional)
extended query and unique 64-bit request ID.

Additionally, the client may send any number of
@code{struct GNUNET_DHT_ClientGetResultSeenMessage}s to notify the
service about results that the client is already aware of.
These messages consist of the key, the unique 64-bit ID of the request,
and an arbitrary number of hash codes over the blocks that the client is
already aware of. As messages are restricted to 64k, a client that
already knows more than about a thousand blocks may need to send
several of these messages. Naturally, the client should transmit these
messages as quickly as possible after the original GET request such that
the DHT can filter those results in the network early on. Naturally, as
these messages are send after the original request, it is conceivalbe
that the DHT service may return blocks that match those already known
to the client anyway.

In response to a GET request, the service will send @code{struct
GNUNET_DHT_ClientResultMessage}s to the client. These messages contain the
block type, expiration, key, unique ID of the request and of course the
value (a block). Depending on the options set for the respective
operations, the replies may also contain the path the GET and/or the PUT
took through the network.

A client can stop receiving replies either by disconnecting or by sending
a @code{struct GNUNET_DHT_ClientGetStopMessage} which must contain the
key and the 64-bit unique ID of the original request. Using an
explicit "stop" message is more common as this allows a client to run
many concurrent GET operations over the same connection with the DHT
service --- and to stop them individually.

@node Monitoring the DHT
@subsubsection Monitoring the DHT

@c %**end of header

To begin monitoring, the client sends a
@code{struct GNUNET_DHT_MonitorStartStop} message to the DHT service.
In this message, flags can be set to enable (or disable) monitoring of
GET, PUT and RESULT messages that pass through a peer. The message can
also restrict monitoring to a particular block type or a particular key.
Once monitoring is enabled, the DHT service will notify the client about
any matching event using @code{struct GNUNET_DHT_MonitorGetMessage}s for
GET events, @code{struct GNUNET_DHT_MonitorPutMessage} for PUT events
and @code{struct GNUNET_DHT_MonitorGetRespMessage} for RESULTs. Each of
these messages contains all of the information about the event.

@node The DHT Peer-to-Peer Protocol
@subsection The DHT Peer-to-Peer Protocol
@c %**end of header


@menu
* Routing GETs or PUTs::
* PUTting data into the DHT2::
* GETting data from the DHT2::
@end menu

@node Routing GETs or PUTs
@subsubsection Routing GETs or PUTs

@c %**end of header

When routing GETs or PUTs, the DHT service selects a suitable subset of
neighbours for forwarding. The exact number of neighbours can be zero or
more and depends on the hop counter of the query (initially zero) in
relation to the (log of) the network size estimate, the desired
replication level and the peer's connectivity.
Depending on the hop counter and our network size estimate, the selection
of the peers maybe randomized or by proximity to the key.
Furthermore, requests include a set of peers that a request has already
traversed; those peers are also excluded from the selection.

@node PUTting data into the DHT2
@subsubsection PUTting data into the DHT2

@c %**end of header

To PUT data into the DHT, the service sends a @code{struct PeerPutMessage}
of type @code{GNUNET_MESSAGE_TYPE_DHT_P2P_PUT} to the respective
neighbour.
In addition to the usual information about the content (type, routing
options, desired replication level for the content, expiration time, key
and value), the message contains a fixed-size Bloom filter with
information about which peers (may) have already seen this request.
This Bloom filter is used to ensure that DHT messages never loop back to
a peer that has already processed the request.
Additionally, the message includes the current hop counter and, depending
on the routing options, the message may include the full path that the
message has taken so far.
The Bloom filter should already contain the identity of the previous hop;
however, the path should not include the identity of the previous hop and
the receiver should append the identity of the sender to the path, not
its own identity (this is done to reduce bandwidth).

@node GETting data from the DHT2
@subsubsection GETting data from the DHT2

@c %**end of header

A peer can search the DHT by sending @code{struct PeerGetMessage}s of type
@code{GNUNET_MESSAGE_TYPE_DHT_P2P_GET} to other peers. In addition to the
usual information about the request (type, routing options, desired
replication level for the request, the key and the extended query), a GET
request also again contains a hop counter, a Bloom filter over the peers
that have processed the request already and depending on the routing
options the full path traversed by the GET.
Finally, a GET request includes a variable-size second Bloom filter and a
so-called Bloom filter mutator value which together indicate which
replies the sender has already seen. During the lookup, each block that
matches they block type, key and extended query is additionally subjected
to a test against this Bloom filter.
The block plugin is expected to take the hash of the block and combine it
with the mutator value and check if the result is not yet in the Bloom
filter. The originator of the query will from time to time modify the
mutator to (eventually) allow false-positives filtered by the Bloom filter
to be returned.

Peers that receive a GET request perform a local lookup (depending on
their proximity to the key and the query options) and forward the request
to other peers.
They then remember the request (including the Bloom filter for blocking
duplicate results) and when they obtain a matching, non-filtered response
a @code{struct PeerResultMessage} of type
@code{GNUNET_MESSAGE_TYPE_DHT_P2P_RESULT} is forwarded to the previous
hop.
Whenver a result is forwarded, the block plugin is used to update the
Bloom filter accordingly, to ensure that the same result is never
forwarded more than once.
The DHT service may also cache forwarded results locally if the
"CACHE_RESULTS" option is set to "YES" in the configuration.

@cindex GNS
@cindex GNU Name System
@node GNU Name System (GNS)
@section GNU Name System (GNS)

@c %**end of header

The GNU Name System (GNS) is a decentralized database that enables users
to securely resolve names to values.
Names can be used to identify other users (for example, in social
networking), or network services (for example, VPN services running at a
peer in GNUnet, or purely IP-based services on the Internet).
Users interact with GNS by typing in a hostname that ends in ".gnu"
or ".zkey".

Videos giving an overview of most of the GNS and the motivations behind
it is available here and here.
The remainder of this chapter targets developers that are familiar with
high level concepts of GNS as presented in these talks.
@c TODO: Add links to here and here and to these.

GNS-aware applications should use the GNS resolver to obtain the
respective records that are stored under that name in GNS.
Each record consists of a type, value, expiration time and flags.

The type specifies the format of the value. Types below 65536 correspond
to DNS record types, larger values are used for GNS-specific records.
Applications can define new GNS record types by reserving a number and
implementing a plugin (which mostly needs to convert the binary value
representation to a human-readable text format and vice-versa).
The expiration time specifies how long the record is to be valid.
The GNS API ensures that applications are only given non-expired values.
The flags are typically irrelevant for applications, as GNS uses them
internally to control visibility and validity of records.

Records are stored along with a signature.
The signature is generated using the private key of the authoritative
zone. This allows any GNS resolver to verify the correctness of a
name-value mapping.

Internally, GNS uses the NAMECACHE to cache information obtained from
other users, the NAMESTORE to store information specific to the local
users, and the DHT to exchange data between users.
A plugin API is used to enable applications to define new GNS
record types.

@menu
* libgnunetgns::
* libgnunetgnsrecord::
* GNS plugins::
* The GNS Client-Service Protocol::
* Hijacking the DNS-Traffic using gnunet-service-dns::
* Serving DNS lookups via GNS on W32::
@end menu

@node libgnunetgns
@subsection libgnunetgns

@c %**end of header

The GNS API itself is extremely simple. Clients first connec to the
GNS service using @code{GNUNET_GNS_connect}.
They can then perform lookups using @code{GNUNET_GNS_lookup} or cancel
pending lookups using @code{GNUNET_GNS_lookup_cancel}.
Once finished, clients disconnect using @code{GNUNET_GNS_disconnect}.

@menu
* Looking up records::
* Accessing the records::
* Creating records::
* Future work::
@end menu

@node Looking up records
@subsubsection Looking up records

@c %**end of header

@code{GNUNET_GNS_lookup} takes a number of arguments:

@table @asis
@item handle This is simply the GNS connection handle from
@code{GNUNET_GNS_connect}.
@item name The client needs to specify the name to
be resolved. This can be any valid DNS or GNS hostname.
@item zone The client
needs to specify the public key of the GNS zone against which the
resolution should be done (the ".gnu" zone).
Note that a key must be provided, even if the name ends in ".zkey".
This should typically be the public key of the master-zone of the user.
@item type This is the desired GNS or DNS record type
to look for. While all records for the given name will be returned, this
can be important if the client wants to resolve record types that
themselves delegate resolution, such as CNAME, PKEY or GNS2DNS.
Resolving a record of any of these types will only work if the respective
record type is specified in the request, as the GNS resolver will
otherwise follow the delegation and return the records from the
respective destination, instead of the delegating record.
@item only_cached This argument should typically be set to
@code{GNUNET_NO}. Setting it to @code{GNUNET_YES} disables resolution via
the overlay network.
@item shorten_zone_key If GNS encounters new names during resolution,
their respective zones can automatically be learned and added to the
"shorten zone". If this is desired, clients must pass the private key of
the shorten zone. If NULL is passed, shortening is disabled.
@item proc This argument identifies
the function to call with the result. It is given proc_cls, the number of
records found (possilby zero) and the array of the records as arguments.
proc will only be called once. After proc,> has been called, the lookup
must no longer be cancelled.
@item proc_cls The closure for proc.
@end table

@node Accessing the records
@subsubsection Accessing the records

@c %**end of header

The @code{libgnunetgnsrecord} library provides an API to manipulate the
GNS record array that is given to proc. In particular, it offers
functions such as converting record values to human-readable
strings (and back). However, most @code{libgnunetgnsrecord} functions are
not interesting to GNS client applications.

For DNS records, the @code{libgnunetdnsparser} library provides
functions for parsing (and serializing) common types of DNS records.

@node Creating records
@subsubsection Creating records

@c %**end of header

Creating GNS records is typically done by building the respective record
information (possibly with the help of @code{libgnunetgnsrecord} and
@code{libgnunetdnsparser}) and then using the @code{libgnunetnamestore} to
publish the information. The GNS API is not involved in this
operation.

@node Future work
@subsubsection Future work

@c %**end of header

In the future, we want to expand @code{libgnunetgns} to allow
applications to observe shortening operations performed during GNS
resolution, for example so that users can receive visual feedback when
this happens.

@node libgnunetgnsrecord
@subsection libgnunetgnsrecord

@c %**end of header

The @code{libgnunetgnsrecord} library is used to manipulate GNS
records (in plaintext or in their encrypted format).
Applications mostly interact with @code{libgnunetgnsrecord} by using the
functions to convert GNS record values to strings or vice-versa, or to
lookup a GNS record type number by name (or vice-versa).
The library also provides various other functions that are mostly
used internally within GNS, such as converting keys to names, checking for
expiration, encrypting GNS records to GNS blocks, verifying GNS block
signatures and decrypting GNS records from GNS blocks.

We will now discuss the four commonly used functions of the API.@
@code{libgnunetgnsrecord} does not perform these operations itself,
but instead uses plugins to perform the operation.
GNUnet includes plugins to support common DNS record types as well as
standard GNS record types.

@menu
* Value handling::
* Type handling::
@end menu

@node Value handling
@subsubsection Value handling

@c %**end of header

@code{GNUNET_GNSRECORD_value_to_string} can be used to convert
the (binary) representation of a GNS record value to a human readable,
0-terminated UTF-8 string.
NULL is returned if the specified record type is not supported by any
available plugin.

@code{GNUNET_GNSRECORD_string_to_value} can be used to try to convert a
human readable string to the respective (binary) representation of
a GNS record value.

@node Type handling
@subsubsection Type handling

@c %**end of header

@code{GNUNET_GNSRECORD_typename_to_number} can be used to obtain the
numeric value associated with a given typename. For example, given the
typename "A" (for DNS A reocrds), the function will return the number 1.
A list of common DNS record types is
@uref{http://en.wikipedia.org/wiki/List_of_DNS_record_types, here}.
Note that not all DNS record types are supported by GNUnet GNSRECORD
plugins at this time.

@code{GNUNET_GNSRECORD_number_to_typename} can be used to obtain the
typename associated with a given numeric value.
For example, given the type number 1, the function will return the
typename "A".

@node GNS plugins
@subsection GNS plugins

@c %**end of header

Adding a new GNS record type typically involves writing (or extending) a
GNSRECORD plugin. The plugin needs to implement the
@code{gnunet_gnsrecord_plugin.h} API which provides basic functions that
are needed by GNSRECORD to convert typenames and values of the respective
record type to strings (and back).
These gnsrecord plugins are typically implemented within their respective
subsystems.
Examples for such plugins can be found in the GNSRECORD, GNS and
CONVERSATION subsystems.

The @code{libgnunetgnsrecord} library is then used to locate, load and
query the appropriate gnsrecord plugin.
Which plugin is appropriate is determined by the record type (which is
just a 32-bit integer). The @code{libgnunetgnsrecord} library loads all
block plugins that are installed at the local peer and forwards the
application request to the plugins. If the record type is not
supported by the plugin, it should simply return an error code.

The central functions of the block APIs (plugin and main library) are the
same four functions for converting between values and strings, and
typenames and numbers documented in the previous subsection.

@node The GNS Client-Service Protocol
@subsection The GNS Client-Service Protocol
@c %**end of header

The GNS client-service protocol consists of two simple messages, the
@code{LOOKUP} message and the @code{LOOKUP_RESULT}. Each @code{LOOKUP}
message contains a unique 32-bit identifier, which will be included in the
corresponding response. Thus, clients can send many lookup requests in
parallel and receive responses out-of-order.
A @code{LOOKUP} request also includes the public key of the GNS zone,
the desired record type and fields specifying whether shortening is
enabled or networking is disabled. Finally, the @code{LOOKUP} message
includes the name to be resolved.

The response includes the number of records and the records themselves
in the format created by @code{GNUNET_GNSRECORD_records_serialize}.
They can thus be deserialized using
@code{GNUNET_GNSRECORD_records_deserialize}.

@node Hijacking the DNS-Traffic using gnunet-service-dns
@subsection Hijacking the DNS-Traffic using gnunet-service-dns

@c %**end of header

This section documents how the gnunet-service-dns (and the
gnunet-helper-dns) intercepts DNS queries from the local system.
This is merely one method for how we can obtain GNS queries.
It is also possible to change @code{resolv.conf} to point to a machine
running @code{gnunet-dns2gns} or to modify libc's name system switch
(NSS) configuration to include a GNS resolution plugin.
The method described in this chaper is more of a last-ditch catch-all
approach.

@code{gnunet-service-dns} enables intercepting DNS traffic using policy
based routing.
We MARK every outgoing DNS-packet if it was not sent by our application.
Using a second routing table in the Linux kernel these marked packets are
then routed through our virtual network interface and can thus be
captured unchanged.

Our application then reads the query and decides how to handle it: A
query to an address ending in ".gnu" or ".zkey" is hijacked by
@code{gnunet-service-gns} and resolved internally using GNS.
In the future, a reverse query for an address of the configured virtual
network could be answered with records kept about previous forward
queries.
Queries that are not hijacked by some application using the DNS service
will be sent to the original recipient.
The answer to the query will always be sent back through the virtual
interface with the original nameserver as source address.


@menu
* Network Setup Details::
@end menu

@node Network Setup Details
@subsubsection Network Setup Details

@c %**end of header

The DNS interceptor adds the following rules to the Linux kernel:
@example
iptables -t mangle -I OUTPUT 1 -p udp --sport $LOCALPORT --dport 53 \
-j ACCEPT iptables -t mangle -I OUTPUT 2 -p udp --dport 53 -j MARK \
--set-mark 3 ip rule add fwmark 3 table2 ip route add default via \
$VIRTUALDNS table2
@end example

@c FIXME: Rewrite to reflect display which is no longer content by line
@c FIXME: due to the < 74 characters limit.
Line 1 makes sure that all packets coming from a port our application
opened beforehand (@code{$LOCALPORT}) will be routed normally.
Line 2 marks every other packet to a DNS-Server with mark 3 (chosen
arbitrarily). The third line adds a routing policy based on this mark
3 via the routing table.

@node Serving DNS lookups via GNS on W32
@subsection Serving DNS lookups via GNS on W32

@c %**end of header

This section documents how the libw32nsp (and
gnunet-gns-helper-service-w32) do DNS resolutions of DNS queries on the
local system. This only applies to GNUnet running on W32.

W32 has a concept of "Namespaces" and "Namespace providers".
These are used to present various name systems to applications in a
generic way.
Namespaces include DNS, mDNS, NLA and others. For each namespace any
number of providers could be registered, and they are queried in an order
of priority (which is adjustable).

Applications can resolve names by using WSALookupService*() family of
functions.

However, these are WSA-only facilities. Common BSD socket functions for
namespace resolutions are gethostbyname and getaddrinfo (among others).
These functions are implemented internally (by default - by mswsock,
which also implements the default DNS provider) as wrappers around
WSALookupService*() functions (see "Sample Code for a Service Provider"
on MSDN).

On W32 GNUnet builds a libw32nsp - a namespace provider, which can then be
installed into the system by using w32nsp-install (and uninstalled by
w32nsp-uninstall), as described in "Installation Handbook".

libw32nsp is very simple and has almost no dependencies. As a response to
NSPLookupServiceBegin(), it only checks that the provider GUID passed to
it by the caller matches GNUnet DNS Provider GUID, checks that name being
resolved ends in ".gnu" or ".zkey", then connects to
gnunet-gns-helper-service-w32 at 127.0.0.1:5353 (hardcoded) and sends the
name resolution request there, returning the connected socket to the
caller.

When the caller invokes NSPLookupServiceNext(), libw32nsp reads a
completely formed reply from that socket, unmarshalls it, then gives
it back to the caller.

At the moment gnunet-gns-helper-service-w32 is implemented to ever give
only one reply, and subsequent calls to NSPLookupServiceNext() will fail
with WSA_NODATA (first call to NSPLookupServiceNext() might also fail if
GNS failed to find the name, or there was an error connecting to it).

gnunet-gns-helper-service-w32 does most of the processing:

@itemize @bullet
@item Maintains a connection to GNS.
@item Reads GNS config and loads appropriate keys.
@item Checks service GUID and decides on the type of record to look up,
refusing to make a lookup outright when unsupported service GUID is
passed.
@item Launches the lookup
@end itemize

When lookup result arrives, gnunet-gns-helper-service-w32 forms a complete
reply (including filling a WSAQUERYSETW structure and, possibly, a binary
blob with a hostent structure for gethostbyname() client), marshalls it,
and sends it back to libw32nsp. If no records were found, it sends an
empty header.

This works for most normal applications that use gethostbyname() or
getaddrinfo() to resolve names, but fails to do anything with
applications that use alternative means of resolving names (such as
sending queries to a DNS server directly by themselves).
This includes some of well known utilities, like "ping" and "nslookup".

@cindex GNS Namecache
@node GNS Namecache
@section GNS Namecache

@c %**end of header

The NAMECACHE subsystem is responsible for caching (encrypted) resolution
results of the GNU Name System (GNS). GNS makes zone information available
to other users via the DHT. However, as accessing the DHT for every
lookup is expensive (and as the DHT's local cache is lost whenever the
peer is restarted), GNS uses the NAMECACHE as a more persistent cache for
DHT lookups.
Thus, instead of always looking up every name in the DHT, GNS first
checks if the result is already available locally in the NAMECACHE.
Only if there is no result in the NAMECACHE, GNS queries the DHT.
The NAMECACHE stores data in the same (encrypted) format as the DHT.
It thus makes no sense to iterate over all items in the
NAMECACHE --- the NAMECACHE does not have a way to provide the keys
required to decrypt the entries.

Blocks in the NAMECACHE share the same expiration mechanism as blocks in
the DHT --- the block expires wheneever any of the records in
the (encrypted) block expires.
The expiration time of the block is the only information stored in
plaintext. The NAMECACHE service internally performs all of the required
work to expire blocks, clients do not have to worry about this.
Also, given that NAMECACHE stores only GNS blocks that local users
requested, there is no configuration option to limit the size of the
NAMECACHE. It is assumed to be always small enough (a few MB) to fit on
the drive.

The NAMECACHE supports the use of different database backends via a
plugin API.

@menu
* libgnunetnamecache::
* The NAMECACHE Client-Service Protocol::
* The NAMECACHE Plugin API::
@end menu

@node libgnunetnamecache
@subsection libgnunetnamecache

@c %**end of header

The NAMECACHE API consists of five simple functions. First, there is
@code{GNUNET_NAMECACHE_connect} to connect to the NAMECACHE service.
This returns the handle required for all other operations on the
NAMECACHE. Using @code{GNUNET_NAMECACHE_block_cache} clients can insert a
block into the cache.
@code{GNUNET_NAMECACHE_lookup_block} can be used to lookup blocks that
were stored in the NAMECACHE. Both operations can be cancelled using
@code{GNUNET_NAMECACHE_cancel}. Note that cancelling a
@code{GNUNET_NAMECACHE_block_cache} operation can result in the block
being stored in the NAMECACHE --- or not. Cancellation primarily ensures
that the continuation function with the result of the operation will no
longer be invoked.
Finally, @code{GNUNET_NAMECACHE_disconnect} closes the connection to the
NAMECACHE.

The maximum size of a block that can be stored in the NAMECACHE is
@code{GNUNET_NAMECACHE_MAX_VALUE_SIZE}, which is defined to be 63 kB.

@node The NAMECACHE Client-Service Protocol
@subsection The NAMECACHE Client-Service Protocol

@c %**end of header

All messages in the NAMECACHE IPC protocol start with the
@code{struct GNUNET_NAMECACHE_Header} which adds a request
ID (32-bit integer) to the standard message header.
The request ID is used to match requests with the
respective responses from the NAMECACHE, as they are allowed to happen
out-of-order.


@menu
* Lookup::
* Store::
@end menu

@node Lookup
@subsubsection Lookup

@c %**end of header

The @code{struct LookupBlockMessage} is used to lookup a block stored in
the cache.
It contains the query hash. The NAMECACHE always responds with a
@code{struct LookupBlockResponseMessage}. If the NAMECACHE has no
response, it sets the expiration time in the response to zero.
Otherwise, the response is expected to contain the expiration time, the
ECDSA signature, the derived key and the (variable-size) encrypted data
of the block.

@node Store
@subsubsection Store

@c %**end of header

The @code{struct BlockCacheMessage} is used to cache a block in the
NAMECACHE.
It has the same structure as the @code{struct LookupBlockResponseMessage}.
The service responds with a @code{struct BlockCacheResponseMessage} which
contains the result of the operation (success or failure).
In the future, we might want to make it possible to provide an error
message as well.

@node The NAMECACHE Plugin API
@subsection The NAMECACHE Plugin API
@c %**end of header

The NAMECACHE plugin API consists of two functions, @code{cache_block} to
store a block in the database, and @code{lookup_block} to lookup a block
in the database.


@menu
* Lookup2::
* Store2::
@end menu

@node Lookup2
@subsubsection Lookup2

@c %**end of header

The @code{lookup_block} function is expected to return at most one block
to the iterator, and return @code{GNUNET_NO} if there were no non-expired
results.
If there are multiple non-expired results in the cache, the lookup is
supposed to return the result with the largest expiration time.

@node Store2
@subsubsection Store2

@c %**end of header

The @code{cache_block} function is expected to try to store the block in
the database, and return @code{GNUNET_SYSERR} if this was not possible
for any reason.
Furthermore, @code{cache_block} is expected to implicitly perform cache
maintenance and purge blocks from the cache that have expired. Note that
@code{cache_block} might encounter the case where the database already has
another block stored under the same key. In this case, the plugin must
ensure that the block with the larger expiration time is preserved.
Obviously, this can done either by simply adding new blocks and selecting
for the most recent expiration time during lookup, or by checking which
block is more recent during the store operation.

@cindex REVOCATION Subsystem
@node REVOCATION Subsystem
@section REVOCATION Subsystem
@c %**end of header

The REVOCATION subsystem is responsible for key revocation of Egos.
If a user learns that theis private key has been compromised or has lost
it, they can use the REVOCATION system to inform all of the other users
that their private key is no longer valid.
The subsystem thus includes ways to query for the validity of keys and to
propagate revocation messages.

@menu
* Dissemination::
* Revocation Message Design Requirements::
* libgnunetrevocation::
* The REVOCATION Client-Service Protocol::
* The REVOCATION Peer-to-Peer Protocol::
@end menu

@node Dissemination
@subsection Dissemination

@c %**end of header

When a revocation is performed, the revocation is first of all
disseminated by flooding the overlay network.
The goal is to reach every peer, so that when a peer needs to check if a
key has been revoked, this will be purely a local operation where the
peer looks at his local revocation list. Flooding the network is also the
most robust form of key revocation --- an adversary would have to control
a separator of the overlay graph to restrict the propagation of the
revocation message. Flooding is also very easy to implement --- peers that
receive a revocation message for a key that they have never seen before
simply pass the message to all of their neighbours.

Flooding can only distribute the revocation message to peers that are
online.
In order to notify peers that join the network later, the revocation
service performs efficient set reconciliation over the sets of known
revocation messages whenever two peers (that both support REVOCATION
dissemination) connect.
The SET service is used to perform this operation efficiently.

@node Revocation Message Design Requirements
@subsection Revocation Message Design Requirements

@c %**end of header

However, flooding is also quite costly, creating O(|E|) messages on a
network with |E| edges.
Thus, revocation messages are required to contain a proof-of-work, the
result of an expensive computation (which, however, is cheap to verify).
Only peers that have expended the CPU time necessary to provide
this proof will be able to flood the network with the revocation message.
This ensures that an attacker cannot simply flood the network with
millions of revocation messages. The proof-of-work required by GNUnet is
set to take days on a typical PC to compute; if the ability to quickly
revoke a key is needed, users have the option to pre-compute revocation
messages to store off-line and use instantly after their key has expired.

Revocation messages must also be signed by the private key that is being
revoked. Thus, they can only be created while the private key is in the
possession of the respective user. This is another reason to create a
revocation message ahead of time and store it in a secure location.

@node libgnunetrevocation
@subsection libgnunetrevocation

@c %**end of header

The REVOCATION API consists of two parts, to query and to issue
revocations.


@menu
* Querying for revoked keys::
* Preparing revocations::
* Issuing revocations::
@end menu

@node Querying for revoked keys
@subsubsection Querying for revoked keys

@c %**end of header

@code{GNUNET_REVOCATION_query} is used to check if a given ECDSA public
key has been revoked.
The given callback will be invoked with the result of the check.
The query can be cancelled using @code{GNUNET_REVOCATION_query_cancel} on
the return value.

@node Preparing revocations
@subsubsection Preparing revocations

@c %**end of header

It is often desirable to create a revocation record ahead-of-time and
store it in an off-line location to be used later in an emergency.
This is particularly true for GNUnet revocations, where performing the
revocation operation itself is computationally expensive and thus is
likely to take some time.
Thus, if users want the ability to perform revocations quickly in an
emergency, they must pre-compute the revocation message.
The revocation API enables this with two functions that are used to
compute the revocation message, but not trigger the actual revocation
operation.

@code{GNUNET_REVOCATION_check_pow} should be used to calculate the
proof-of-work required in the revocation message. This function takes the
public key, the required number of bits for the proof of work (which in
GNUnet is a network-wide constant) and finally a proof-of-work number as
arguments.
The function then checks if the given proof-of-work number is a valid
proof of work for the given public key. Clients preparing a revocation
are expected to call this function repeatedly (typically with a
monotonically increasing sequence of numbers of the proof-of-work number)
until a given number satisfies the check.
That number should then be saved for later use in the revocation
operation.

@code{GNUNET_REVOCATION_sign_revocation} is used to generate the
signature that is required in a revocation message.
It takes the private key that (possibly in the future) is to be revoked
and returns the signature.
The signature can again be saved to disk for later use, which will then
allow performing a revocation even without access to the private key.

@node Issuing revocations
@subsubsection Issuing revocations


Given a ECDSA public key, the signature from @code{GNUNET_REVOCATION_sign}
and the proof-of-work,
@code{GNUNET_REVOCATION_revoke} can be used to perform the
actual revocation. The given callback is called upon completion of the
operation. @code{GNUNET_REVOCATION_revoke_cancel} can be used to stop the
library from calling the continuation; however, in that case it is
undefined whether or not the revocation operation will be executed.

@node The REVOCATION Client-Service Protocol
@subsection The REVOCATION Client-Service Protocol


The REVOCATION protocol consists of four simple messages.

A @code{QueryMessage} containing a public ECDSA key is used to check if a
particular key has been revoked. The service responds with a
@code{QueryResponseMessage} which simply contains a bit that says if the
given public key is still valid, or if it has been revoked.

The second possible interaction is for a client to revoke a key by
passing a @code{RevokeMessage} to the service. The @code{RevokeMessage}
contains the ECDSA public key to be revoked, a signature by the
corresponding private key and the proof-of-work, The service responds
with a @code{RevocationResponseMessage} which can be used to indicate
that the @code{RevokeMessage} was invalid (i.e. proof of work incorrect),
or otherwise indicates that the revocation has been processed
successfully.

@node The REVOCATION Peer-to-Peer Protocol
@subsection The REVOCATION Peer-to-Peer Protocol

@c %**end of header

Revocation uses two disjoint ways to spread revocation information among
peers.
First of all, P2P gossip exchanged via CORE-level neighbours is used to
quickly spread revocations to all connected peers.
Second, whenever two peers (that both support revocations) connect,
the SET service is used to compute the union of the respective revocation
sets.

In both cases, the exchanged messages are @code{RevokeMessage}s which
contain the public key that is being revoked, a matching ECDSA signature,
and a proof-of-work.
Whenever a peer learns about a new revocation this way, it first
validates the signature and the proof-of-work, then stores it to disk
(typically to a file $GNUNET_DATA_HOME/revocation.dat) and finally
spreads the information to all directly connected neighbours.

For computing the union using the SET service, the peer with the smaller
hashed peer identity will connect (as a "client" in the two-party set
protocol) to the other peer after one second (to reduce traffic spikes
on connect) and initiate the computation of the set union.
All revocation services use a common hash to identify the SET operation
over revocation sets.

The current implementation accepts revocation set union operations from
all peers at any time; however, well-behaved peers should only initiate
this operation once after establishing a connection to a peer with a
larger hashed peer identity.

@cindex FS
@cindex FS Subsystem
@node File-sharing (FS) Subsystem
@section File-sharing (FS) Subsystem

@c %**end of header

This chapter describes the details of how the file-sharing service works.
As with all services, it is split into an API (libgnunetfs), the service
process (gnunet-service-fs) and user interface(s).
The file-sharing service uses the datastore service to store blocks and
the DHT (and indirectly datacache) for lookups for non-anonymous
file-sharing.
Furthermore, the file-sharing service uses the block library (and the
block fs plugin) for validation of DHT operations.

In contrast to many other services, libgnunetfs is rather complex since
the client library includes a large number of high-level abstractions;
this is necessary since the Fs service itself largely only operates on
the block level.
The FS library is responsible for providing a file-based abstraction to
applications, including directories, meta data, keyword search,
verification, and so on.

The method used by GNUnet to break large files into blocks and to use
keyword search is called the
"Encoding for Censorship Resistant Sharing" (ECRS).
ECRS is largely implemented in the fs library; block validation is also
reflected in the block FS plugin and the FS service.
ECRS on-demand encoding is implemented in the FS service.

NOTE: The documentation in this chapter is quite incomplete.

@menu
* Encoding for Censorship-Resistant Sharing (ECRS)::
* File-sharing persistence directory structure::
@end menu

@cindex ECRS
@cindex Encoding for Censorship-Resistant Sharing
@node Encoding for Censorship-Resistant Sharing (ECRS)
@subsection Encoding for Censorship-Resistant Sharing (ECRS)

@c %**end of header

When GNUnet shares files, it uses a content encoding that is called ECRS,
the Encoding for Censorship-Resistant Sharing.
Most of ECRS is described in the (so far unpublished) research paper
attached to this page. ECRS obsoletes the previous ESED and ESED II
encodings which were used in GNUnet before version 0.7.0.
The rest of this page assumes that the reader is familiar with the
attached paper. What follows is a description of some minor extensions
that GNUnet makes over what is described in the paper.
The reason why these extensions are not in the paper is that we felt
that they were obvious or trivial extensions to the original scheme and
thus did not warrant space in the research report.

@menu
* Namespace Advertisements::
* KSBlocks::
@end menu

@node Namespace Advertisements
@subsubsection Namespace Advertisements

@c %**end of header
@c %**FIXME: all zeroses -> ?

An @code{SBlock} with identifier all zeros is a signed
advertisement for a namespace. This special @code{SBlock} contains
metadata describing the content of the namespace.
Instead of the name of the identifier for a potential update, it contains
the identifier for the root of the namespace.
The URI should always be empty. The @code{SBlock} is signed with the
content provder's RSA private key (just like any other SBlock). Peers
can search for @code{SBlock}s in order to find out more about a namespace.

@node KSBlocks
@subsubsection KSBlocks

@c %**end of header

GNUnet implements @code{KSBlocks} which are @code{KBlocks} that, instead
of encrypting a CHK and metadata, encrypt an @code{SBlock} instead.
In other words, @code{KSBlocks} enable GNUnet to find @code{SBlocks}
using the global keyword search.
Usually the encrypted @code{SBlock} is a namespace advertisement.
The rationale behind @code{KSBlock}s and @code{SBlock}s is to enable
peers to discover namespaces via keyword searches, and, to associate
useful information with namespaces. When GNUnet finds @code{KSBlocks}
during a normal keyword search, it adds the information to an internal
list of discovered namespaces. Users looking for interesting namespaces
can then inspect this list, reducing the need for out-of-band discovery
of namespaces.
Naturally, namespaces (or more specifically, namespace advertisements) can
also be referenced from directories, but @code{KSBlock}s should make it
easier to advertise namespaces for the owner of the pseudonym since they
eliminate the need to first create a directory.

Collections are also advertised using @code{KSBlock}s.

@table @asis
@item Attachment Size
@item  ecrs.pdf 270.68 KB
@item https://gnunet.org/sites/default/files/ecrs.pdf
@end table

@node File-sharing persistence directory structure
@subsection File-sharing persistence directory structure

@c %**end of header

This section documents how the file-sharing library implements
persistence of file-sharing operations and specifically the resulting
directory structure.
This code is only active if the @code{GNUNET_FS_FLAGS_PERSISTENCE} flag
was set when calling @code{GNUNET_FS_start}.
In this case, the file-sharing library will try hard to ensure that all
major operations (searching, downloading, publishing, unindexing) are
persistent, that is, can live longer than the process itself.
More specifically, an operation is supposed to live until it is
explicitly stopped.

If @code{GNUNET_FS_stop} is called before an operation has been stopped, a
@code{SUSPEND} event is generated and then when the process calls
@code{GNUNET_FS_start} next time, a @code{RESUME} event is generated.
Additionally, even if an application crashes (segfault, SIGKILL, system
crash) and hence @code{GNUNET_FS_stop} is never called and no
@code{SUSPEND} events are generated, operations are still resumed (with
@code{RESUME} events).
This is implemented by constantly writing the current state of the
file-sharing operations to disk.
Specifically, the current state is always written to disk whenever
anything significant changes (the exception are block-wise progress in
publishing and unindexing, since those operations would be slowed down
significantly and can be resumed cheaply even without detailed
accounting).
Note that if the process crashes (or is killed) during a serialization
operation, FS does not guarantee that this specific operation is
recoverable (no strict transactional semantics, again for performance
reasons). However, all other unrelated operations should resume nicely.

Since we need to serialize the state continuously and want to recover as
much as possible even after crashing during a serialization operation,
we do not use one large file for serialization.
Instead, several directories are used for the various operations.
When @code{GNUNET_FS_start} executes, the master directories are scanned
for files describing operations to resume.
Sometimes, these operations can refer to related operations in child
directories which may also be resumed at this point.
Note that corrupted files are cleaned up automatically.
However, dangling files in child directories (those that are not
referenced by files from the master directories) are not automatically
removed.

Persistence data is kept in a directory that begins with the "STATE_DIR"
prefix from the configuration file
(by default, "$SERVICEHOME/persistence/") followed by the name of the
client as given to @code{GNUNET_FS_start} (for example, "gnunet-gtk")
followed by the actual name of the master or child directory.

The names for the master directories follow the names of the operations:

@itemize @bullet
@item "search"
@item "download"
@item "publish"
@item "unindex"
@end itemize

Each of the master directories contains names (chosen at random) for each
active top-level (master) operation.
Note that a download that is associated with a search result is not a
top-level operation.

In contrast to the master directories, the child directories are only
consulted when another operation refers to them.
For each search, a subdirectory (named after the master search
synchronization file) contains the search results.
Search results can have an associated download, which is then stored in
the general "download-child" directory.
Downloads can be recursive, in which case children are stored in
subdirectories mirroring the structure of the recursive download
(either starting in the master "download" directory or in the
"download-child" directory depending on how the download was initiated).
For publishing operations, the "publish-file" directory contains
information about the individual files and directories that are part of
the publication.
However, this directory structure is flat and does not mirror the
structure of the publishing operation.
Note that unindex operations cannot have associated child operations.

@cindex REGEX subsystem
@node REGEX Subsystem
@section REGEX Subsystem

@c %**end of header

Using the REGEX subsystem, you can discover peers that offer a particular
service using regular expressions.
The peers that offer a service specify it using a regular expressions.
Peers that want to patronize a service search using a string.
The REGEX subsystem will then use the DHT to return a set of matching
offerers to the patrons.

For the technical details, we have Max's defense talk and Max's Master's
thesis.

@c An additional publication is under preparation and available to
@c team members (in Git).
@c FIXME: Where is the file? Point to it. Assuming that it's szengel2012ms

@menu
* How to run the regex profiler::
@end menu

@node How to run the regex profiler
@subsection How to run the regex profiler

@c %**end of header

The gnunet-regex-profiler can be used to profile the usage of mesh/regex
for a given set of regular expressions and strings.
Mesh/regex allows you to announce your peer ID under a certain regex and
search for peers matching a particular regex using a string.
See @uref{https://gnunet.org/szengel2012ms, szengel2012ms} for a full
introduction.

First of all, the regex profiler uses GNUnet testbed, thus all the
implications for testbed also apply to the regex profiler
(for example you need password-less ssh login to the machines listed in
your hosts file).

@strong{Configuration}

Moreover, an appropriate configuration file is needed.
Generally you can refer to the
@file{contrib/regex_profiler_infiniband.conf} file in the sourcecode
of GNUnet for an example configuration.
In the following paragraph the important details are highlighted.

Announcing of the regular expressions is done by the
gnunet-daemon-regexprofiler, therefore you have to make sure it is
started, by adding it to the AUTOSTART set of ARM:

@example
[regexprofiler]
AUTOSTART = YES
@end example

@noindent
Furthermore you have to specify the location of the binary:

@example
[regexprofiler]
# Location of the gnunet-daemon-regexprofiler binary.
BINARY = /home/szengel/gnunet/src/mesh/.libs/gnunet-daemon-regexprofiler
# Regex prefix that will be applied to all regular expressions and
# search string.
REGEX_PREFIX = "GNVPN-0001-PAD"
@end example

@noindent
When running the profiler with a large scale deployment, you probably
want to reduce the workload of each peer.
Use the following options to do this.

@example
[dht]
# Force network size estimation
FORCE_NSE = 1

[dhtcache]
DATABASE = heap
# Disable RC-file for Bloom filter? (for benchmarking with limited IO
# availability)
DISABLE_BF_RC = YES
# Disable Bloom filter entirely
DISABLE_BF = YES

[nse]
# Minimize proof-of-work CPU consumption by NSE
WORKBITS = 1
@end example

@noindent
@strong{Options}

To finally run the profiler some options and the input data need to be
specified on the command line.

@example
gnunet-regex-profiler -c config-file -d log-file -n num-links \
-p path-compression-length -s search-delay -t matching-timeout \
-a num-search-strings hosts-file policy-dir search-strings-file
@end example

@noindent
Where...

@itemize @bullet
@item ... @code{config-file} means the configuration file created earlier.
@item ... @code{log-file} is the file where to write statistics output.
@item ... @code{num-links} indicates the number of random links between
started peers.
@item ... @code{path-compression-length} is the maximum path compression
length in the DFA.
@item ... @code{search-delay} time to wait between peers finished linking
and starting to match strings.
@item ... @code{matching-timeout} timeout after which to cancel the
searching.
@item ... @code{num-search-strings} number of strings in the
search-strings-file.
@item ... the @code{hosts-file} should contain a list of hosts for the
testbed, one per line in the following format:

@itemize @bullet
@item @code{user@@host_ip:port}
@end itemize
@item ... the @code{policy-dir} is a folder containing text files
containing one or more regular expressions. A peer is started for each
file in that folder and the regular expressions in the corresponding file
are announced by this peer.
@item ... the @code{search-strings-file} is a text file containing search
strings, one in each line.
@end itemize

@noindent
You can create regular expressions and search strings for every AS in the
Internet using the attached scripts. You need one of the
@uref{http://data.caida.org/datasets/routing/routeviews-prefix2as/, CAIDA routeviews prefix2as}
data files for this. Run

@example
create_regex.py <filename> <output path>
@end example

@noindent
to create the regular expressions and

@example
create_strings.py <input path> <outfile>
@end example

@noindent
to create a search strings file from the previously created
regular expressions.
